Data validation as a service
This is a Python project and an online tool to check for structural problems and ensure that data fits a specific schema. Applications range from simple validation checks on CSV files, to integration with a larger ETL pipeline. The development of Good Tables has been driven by a real-world pain point: monitoring and validating government spending data in the United Kingdom . A brief overview of this use case can demonstrate the value proposition of Good Tables.
More background at okfnlabs.org
Good Tables == Good Times!
Continuous data validation as a service.
Preliminary designs and specifications can be found in the wiki.
We currently use Redis as a broker:
sudo apt-get install redis-server
Prepare Python and Node virtual environments:
git clone firstname.lastname@example.org:frictionlessdata/goodtables.io.git cd goodtables.io virtualenv .python -p python3.5 source .python/bin/activate nvm install 6 nvm use 6 make install
.env file with the required environment variables:
$ cp .env.example .env $ editor .env # edit your vars
GTIO_SECRET_KEY must be a 32 bit URL-safe base64 string. You can obtain it by running the followig:
import os import base64 key = base64.urlsafe_b64encode(os.urandom(32)) print(key.decode('utf-8'))
You can also run it as a one-line command as:
python3 -c "import os; import base64; key = base64.urlsafe_b64encode(os.urandom(32)); print(key.decode('utf-8'))" ```. ### Migrations ```bash make migrate # migrate alembic downgrade -1 # downgrade alembic revision -m '<name>' # add a migration
Running the app
Start the Celery worker and dev server:
bash1$ make app bash2$ make queue
For development you probably want:
bash1$ make app-dev bash2$ make queue-dev
The development server runs on
To build frontend files to
make frontend make frontend-dev make frontend-watch
To work on frontend run the watch command:
If you have
app running in dev mode frontend components will be automatically updated
after every source code change. Web page should be reloaded manually.
To run all checks:
To run linting:
make lint # make lint-backend # make lint-frontend
To run unit tests with coverage:
make test-unit # make test-unit-backend # make test-unit-frontend
To run user acceptance end-to-end tests for the whole application:
Note: the current site is in an early alpha version and things are bound to break and change.
The current alpha version supports adding two data sources, GitHub repositories and Amazon S3 buckets.
To try it out, go to try.goodtables.io and log in with your GitHub account
To add a Github Repo:
- Click on the "Add Repository" button from the dashboard.
- If you don't see a list of your repositories, click on the "Sync account" button. This might take a while.
- Once the list of repositories appears, click "Activate" on the repository that you want to validate.
- From now on, every time you push to that repository a validation job will be run on goodtables.io.
- You also should see the validation status next to the commit messages and pull requests (note that this only works on repositories in the
frictionlessdataorganization due to #132):
The statuses link to the full report on the prototype app.
To add an S3 Bucket:
- Click on the "Add Bucket" button from the dashboard.
- Enter an AWS Access Key Id and a Secret Access Key pair, and the name of the bucket. Your keys should allow reading the contents of the bucket, as well as creating notification events on it. Click on "Add bucket"
- From now on, every time you update a file on the S3 bucket (upload or delete) a validation job will be run on goodtables.io.