We blogged recently about creating an authoritative dataset of government organisations, which is work happening under the Open Government Partnership’s (OGP) National Action Plan (PDF 1.48MB).
It sounds small, but it’s really exciting (and actually pretty complex!) as it has the potential to be a foundation piece for government transparency and accountability.
Currently it’s scoped to include a basic core set of data listing government agencies, the leadership roles associated, their structure and of course general contact information.
It has so much potential for future expansion too once the initial version is available. For example, information on agencies accountabilities and the legislation they administer.
In keeping with being a signatory of the Digital 9 Charter, and in the spirit of upholding the Digital Design Service Standard, we’ve been investigating which open standards should be considered to describe government organisations in this new dataset.
Open standards are critical for interoperability of the data between systems and to ensure anyone can freely open and reuse the data.
Investigating the data standards landscape
I’ve been investigating similar datasets (government organisation registers) from Canada, Australia and the UK. I’ve also been looking at some examples closer to home including the existing Govt.nz A-Z and the NZ Business Number registry.
When all these datasets are compared they uncover a core set of common fields. This hints at a minimum useful set of data properties to include in the proposed dataset. For example, all the datasets I looked at captured fields like organisation name and general contact details (website, email, postal address) for the organisation. However each also held a set of much more specialised fields, depending on who the data publisher was.
Australia’s government organisation register, for example, is published by their Treasury, so has a focus on data like budgets, appropriations and public spend. It includes additional financial and ministerial portfolio related fields.
While this is all very useful data, for our first version of the NZ government organisations register, our suggested approach is having an agreed core set of fields we can then build on.
What was also clear from this work is that there did not seem to be an open standard in practice for holding this type of data (at least not in the dataset we reviewed). To address this gap I’ve also been looking into existing open standards for describing ‘organisations’ in data.
While, there appear to be several proprietary standards after scanning the environment it appears a suitable candidate for the proposed dataset is the Organization data schema from Schema.org.
What’s Schema.org and why is it useful?
Schema.org is a community driven, structured data vocabulary for modelling data on the web.
It helps both people and machines understand how to describe things (like web pages, pieces of music, recipes and in our case, organisations) in a consistent way. Data described using these standards make them interoperable and easier to reuse in other applications.
The Schema.org ‘Organization’ standard maps well to the set of common fields I mentioned above. However, it’s not entirely like for like. In our context we’re going to need some extra fields in order to make it useful.
- Te Reo Māori name of the organisation — the closest useful field in Schema.org would be ‘alternativeName’.
- New Zealand Business Number (NZBN) — we can map this to the ‘identifier’ field in Schema.org. However not all the government organisations we've tested have a NZBN (the current most unique identifier for an organisation is its ‘legal title’).
- Superseding organisation — in cases where the organisation has merged/split/dissolved.
- You can view others in our reference model dataset.
Given we want to make this dataset as widely usable as possible I’m suggesting that we don’t adopt the Schema.org standard directly. Instead we would look to map a set of plain English field names to those found in the Schema.org. We have an example data dictionary to show you how this mapping might work.
This allows us to include the additional fields mentioned above to make it as useful as possible, while keeping aligned with a known open standard. This way, it can be easily used and understood by people as well as machines — the best of both worlds.
We think that using a widely known, open standard for modelling organisations on the web, and mapping a researched set of core fields to this, is a pragmatic and workable approach. However, do tell us if we’ve missed something as we are open to exploring all suitable open standards for this dataset.
The best way to test the usefulness of the proposed data model is to try it out. I’ve produced a draft reference model and data dictionary of the proposed dataset which you can access over on data.govt.nz. It includes some model data for a few existing (and past) New Zealand government organisations to help give you a feel for how the data would look.
It would be great to get feedback from everyone who’s interested in open standards, data modelling the machinery of government, or has a need to use this information. Leave us a comment below or get in touch firstname.lastname@example.org.