Data.govt.nz had its third birthday recently, so now is as good a time as any for a quick retrospective on how it has progressed in its role as a directory of re-usable, machine-readable government datasets.
When it launched on 4 November 2009, data.govt.nz went live with pointers to about a hundred machine-readable datasets published by government agencies. They were all scoured manually from the govt.nz domain by the data.govt.nz build team. Many of the datasets lacked essential information like date of last update, contact or ownership details other than the default details for the agency, or any clear statement of re-use rights other than “Crown Copyright”.
On its third birthday, data.govt.nz listed 2261 datasets, and the large majority of them (around 75%) are sourced and maintained by an aggregator which reads Atom standard feeds from other agencies. 84% of them (1908) are licensed for re-use under NZGOAL Creative Commons licences.
Datasets with no indication at all of date of creation or update are now very much the exception rather than the norm. And over 90% provide phone or email details of contacts, or teams within agencies who curate(d) the dataset. Why? Because agencies with mature data release processes are naturally dominating the open data landscape, releasing significant numbers of datasets and providing well-constructed metadata records to data.govt.nz.
Benefits of automation
A couple of weeks ago, we sent the aggregator out to feed in a little over 300 new geospatial metadata records from the LINZ Data Service (LDS). It could equally have been from Landcare Research, or Treasury, or other agencies like Ministry for the Environment, Department of Conservation or Wellington City Council who publish datasets on koordinates.com; data.govt.nz takes automated feeds from each of them.
LINZ in particular is a significant publisher of re-usable machine-readable data, via the LDS; in part, the abrupt surge in the numbers of dataset records in the May to August 2011 period marks the LDS's arrival on the open data scene. It now provides 57% of data.govt.nz’s records via an automated feed. Other significant sources are the local and central government agencies that publish geospatial data via koordinates.com (9%, via feed), the Treasury (7%, via feed), Landcare Research (6%, via feed), Statistics NZ (4%), and the roughly 100 agencies that manually release half-yearly updates of CE’s expenses (4%).
Given the relative scale of LINZ’s contribution to the directory, we have no specific plans to further develop data.govt.nz until the shape and scope of LINZ’s proposed Open Data Service and its relationship to data.govt.nz become clear.
Mature Data Release Processes
The metadata we aggregate from feeds is complete and in accordance with the metadata schema published on data.govt.nz (Excel 22kB). Provision of a feed requires consideration of what data to release, what licensing is applicable, what’s an appropriate format, when it was last updated, who to contact and what it costs, if anything—i.e. a mature data release process.
But feeds aren’t the silver bullet. They’re a useful tool for agencies that have good data release processes in place and release sufficient volumes of data to justify maintaining a feed. But that’s all.
Of more significance is proactive custodianship of data to be released, and prioritising its release.
Agencies getting started on releasing data for reuse under the Declaration on Open and Transparent Government can find some useful guidance in the Toolkit for Agencies, and in particular the Process for Prioritisation and Release of High Value Public Data for Reuse (PDF 423kB). It steps rather nicely through the challenges of balancing agency priorities and constraints with the NZ Data and Information Management Principles endorsed by Cabinet in 2011.
In addition to consuming feeds, data.govt.nz provides its own customisable feeds that can be consumed by any feed reader. The results of the keyword search or filters on the Catalogue page offer a feed that matches the filtered view. If you’re only interested in a subset of the records on data.govt.nz, you can create the view you want and pull the resulting feed into a feed reader. From then on you’ll be notified when records matching that view are added or updated.
And yes, there are improvements we could make—for example, a more extensive API and a bulk download of all records. When demand and available resources dictate, we’ll take a another look at the future for data.govt.nz. But until then, agency policies and processes around the release of re-usable data are a higher priority.