Cameron Hurst looks at an accessible approach to sharing data.
Source – https://frictionlessdata.io
Data can be stored in many ways and with such a variety of formats, it can be difficult to compare one data set to another.
Enter frictionless data.
This set of specifications and packages are described on the Open Knowledge International website as
providing a simple wrapper and basic structure for transportation of data that significantly reduces the friction in data sharing and integration, supports automation and does this without imposing major changes on the underlying data being packaged.
A good analogy for this concept is shipping containers.
Before their creation and standardisation, there was no guarantee of the size or shape of goods and so it was difficult to come up with any standardised approach to loading or unloading them.
Once the size and shape of containers was standardised, people knew what size and shape to expect, so new innovations were possible, such as the use of cranes.
In terms of how it actually works, frictionless data packs up a dataset’s metadata into a file named datapackage.json in a standardised format.
By putting this together with the data, it makes it much easier to compare varied datasets by having everything set up in the same format.
By standardising the metadata, we provide it in one form for our entire dataset.
This gives users of the data advantages such as:
- they can extract metadata from files without needing to covert it into a particular format
- they can compare different datasets, confident that the metadata details the equivalent parts
We might even be able to automate the creation of our metadata, if we have a specified scheme to work to and new tools and APIs can make use of the metadata file.
If more places pick up the frictionless data format, it could make it much easier to add other datasets to our own (such as data in countries we don’t currently have). It could also mean that our tools can access these packages to perform automated tasks more easily.
One such example of the use of data packages could be taking one of our census datasets and comparing it to a dataset from another census data provider in a country we don’t currently store data census data for.
Without frictionless data, this comparison could prove to take a good amount of time and effort, translating the metadata of one dataset to the format of the other.
Using frictionless data packages, this problem is completely avoided as both datasets would be using the same format.
For more information on frictionless data, including the specifications, case studies and documentation, you can head over to their website here.
Here is an overview of the whole system as told by open knowledge international’s president Rufus Pollock.
Cameron Hurst is an early career full stack developer working with aggregate data for the UK Data Service.