We’re always keen to find alternative ways for our users to access and use our data.
In evaluating DKAN as our new data delivery service, we discovered that it has a API, which allows other systems to find out about the datasets and resources DKAN holds.
We came across a package that is being maintained by Tony Fujs, Karthik Ramanathan and Meera Seladore.
The package is called dkanr which is described as a “General purpose R client to the DKAN Open Data platform”.
I’m a novice user of R, so my R script may not be as polished as a more experienced R user, but I found the package easy to use and was able to use the package to find a dataset of interest and then download some data into a data frame ready for me to manipulate.
The Github page includes a readme, which explains how to setup the package and its basic use.
Trying out the package
I decided I wanted to find the dataset for sex from the 2011 Census.
I’m going to walk you through the process I went through. You can download the code from our Github.
To tell the package to use our version of DKAN I needed to enter the web address which is https://www.statistics.digitalresources.jisc.ac.uk/:
Now I loaded the settings:
This is what the output looked like:
I then got a list of nodes that are of type dataset:
nodes <- list_nodes_all(filters = c(type="dataset"))
I looked at the data frame returned:
This is the output:
The next job was to find datasets that have sex 2011 in the title:
dfFilter<-nodes %>% select(nid,title) %>% filter(str_detect(nodes$title,fixed("sex 2011",ignore_case=TRUE)))
I looked at the results:
I was only interested in the dataset for sex 2011, so I asked for the metadata for node 195:
metadata <-retrieve_node(nid =195, as ="list")
I then want to know what resources node IDs dataset 195 has:
For simplicity, I looked at the first resource:
resource_metadata <-retrieve_node("196", as ="list")
As I wanted the csv file, I asked for the url to that file:
Unfortunately, for the version of the dkanr package I was using this command failed, but looking through the source code for the package I worked out how to request the url, the developers have now fixed this issue, so hopefully you won’t run into, but I’ll leave the code here, just in case:
Now I could read the data into R and view it:
I was only interested in seeing the name of the area, the 2 data columns and rows 6:16, so I sub-setted the data:
I googled and used a package to reshape the data, called reshape2, as when I initially tried to plot the data it wasn’t working how I wanted it to look:
xymelt <- melt(xy, id.vars = "GEO_LABEL")
Then I graphed the results:
ggplot(xymelt, aes(x = GEO_LABEL, y = value, group =1, color=variable)) +