Opening the ‘open data’ floodgates

17th November 2011

Open data can deliver greater openness and transparency that can benefit governments, taxpayers and even businesses, but it is not without challenges. FSN writer Lesley Meall considers the good, the bad and the ugly faces of open data.

Open data is a wonderful idea, isn’t it? The notion that individuals and organisations (including governments) should make lots of non-personal (and some personal) data and content freely available to everyone and anyone who might potentially have any use for it can’t possibly be a bad thing, can it? Well, as always, perspective is everything, and the answers to these questions depend on whom you ask. They also depend on what the information is, how much of it there is, how it is being used, what other non-public or publicly available data it can potentially be used in conjunction with, plus a host of other factors too numerous to mention. 

I could bore for the Olympics giving my personal perspective on all of this. Suffice to say, I have mixed feelings – and a healthy disrespect for individuals and organisations that extol blind faith in their particular set of beliefs and urge the rest of us to do likewise for the greater good (as many in the open data movement seem to). With my journalist’s hat on I am all for openness (within clearly defined boundaries, of course). So I will try to present FSN readers with a brief and balanced overview of ‘open data’ and consider some (but not all) of the associated challenges and opportunities, without my heart repeatedly leaping onto my metaphorical sleeve. 

But before we go there, let’s start with a little more on what ‘open data’ is. ‘It’s about making all sorts of data and content available so that it can easily be used, reused and re-distributed by anyone, without restriction and without needing legal advice,’ says Dr Rufus Pollock, co-founder of the Open Knowledge Foundation, and an open data advocate. To put this ‘open data’ into context, most of us are used to living and working in a world that is characterised by silos of disparate and disconnected data: it is held in multiple incompatible formats, and it can be difficult to access, even from within the organisations that have created or collected it.

A world where all data is ‘open’ would be one with universal interoperability, and no place for the constraints and complexities that can be created by data silos (proprietary or otherwise), copyrights, patents, and the various closed doors behind which a great deal of data is currently secreted. ‘There is a lot of data out there,’ observes Pollock, and growing amounts of open data, ranging from the geo-spatial and financial, to sonnets and statistics, and originating from public and the private sources – though it is worth noting, that commercial organisations making corporate data openly available are few and far between (on which more, later). 

Let’s also clarify what ‘open data’ is not. It’s not the same as ‘open source’ or ‘open standards’ though it does share a similar ethos and there are areas of overlap – not least because the interoperability of data demands common standards. If you want to find out more about how these things are and are not related you could start your (potentially never-ending) journey of exploration and discovery by visiting opendefinition.org, okfn.org, odata.org, opendatafoundation.org, opensource.org, opensource.com, and cospa-project.org/assets/resources/glossary.pdf – which will all lead to more information and more websites.

Nor is open data the same as ‘big data’, but again, there is overlap, because both of these phenomena are (partially) the result of part of the same wider technology trends and developments. ‘The combination of several technology innovations, in areas like social media, cloud computing, and analytics, offer scenarios that we could hardly imagine in the past,’ says Gartner analyst Andrea Di Maio, and this can combine with the trend towards more openness and transparency being championed by governments and non-governmental organisations (NGOs) to create what he describes as ‘a perfect storm’ of open (and not so open) data, from which we can potentially extract value.

There are already numerous open data sets available online, including Geonames, Wikipedia and Wikibooks. The Massachusetts Institute of Technology makes almost all of its (lecture notes, exams, and video) course content freely available online, to all-comers, without registration. The World Bank and United Nations are among the NGOs making some data openly available, and ‘open government’ initiatives are underway in countries as diverse as Australia, Finland, Kenya, Moldova, Netherlands, the United States (US), which led the way with data.gov, and the United Kingdom (UK) which launched its own version in 2010, with data.gov.uk. 

At data.worldbank.org the World Bank is providing three data sets on (1) world bank projects (2) time series data/indicators and (3) finances, and it provides a visual interface (in various languages) that facilitates exploration of this data (which very usefully comes with direct links to the actual data source). The World Bank is also making three Application Programming Interfaces (APIs) available, so that developers can build these data sets and indicators into new applications and visualisations, and if you want to see some of the tools, applications and data mashups this had resulted in, take a look here

To see what is being done with some of the financial data  being made available by governments (and the collaboration of community-based supporters), take a look at the projects WhereDoesMyMoneyGo.com (UK) and OpenSpending.org (global), which Pollock and the Open Knowledge Foundation are involved in. ‘Our dream is a global map of government and public corporate financial transactions across the world,’ he explains, (and by ‘public corporate financial transactions’ he means published company accounts, reports on contracts won and so on). Just using the open data now provided by the UK government is already having impressive results. 

‘At OpenSpending.org we now have a consolidated view of all 1.9m transactions over £25,000 published by the UK government over the past six to eight months,’ says Pollock. ‘You can browse them, you can see the totals going to different companies, you can see how much is being spent in areas such as health and education,’ he says, adding: ‘Somebody at the Treasury told me that they sometimes use this as a source of data themselves, because it can be easier than using internal data on departmental budgeting and spending.’ (Local authorities are also publishing open data, and on smaller transactions; North Yorkshire County Council for example starts from £500.) 

Of course, without the involvement of organisations such as the Open Knowledge Foundation (or appropriate software tools) data is all it would ever be, as it tends to get published in PDF documents or as spreadsheets. The latter is clearly much easier to work with than the former. ‘If you are publishing data and you want it to be openly available, start simple,’ advises Pollock. So if you have to make a choice between the eXtensible Business Reporting Language (XBRL) and Excel, for example, he suggests opting for the latter (though this doesn’t sit particularly well alongside worldwide regulatory adoption of XBRL for various types of mandatory reporting). 

Those interested in publishing or using published open data also need to consider the legal implications. ‘Open data has to be licensed,’ explains Pollock, adding: ‘Just putting it online doesn’t mean that people can use it. It needs an “open license”.’ This is something you can learn more about at OpenDataCommons.org which explains things such as the Attribution License (ODC-By) and Open Database License (ODC-OdbL), community norms and contracts. It provides full legal versions of these, for you to personalise and use, and guidance on how to apply these to your material – along with various other legal tools. 

So far, commercial organisations have not been rushing to make their corporate data openly available, but there are exceptions, such as the sport manufacturer Nike. It recently appointed Ward Cunningham, the inventor of the wiki, as its ‘Code for a Better Word Fellow’ to lead its plans to make data on the sustainability of its operations available online, using open data to promote workers rights and combat environmental degradation.. Nike started down this path towards selectively making its corporate data open back in 2005, when it publicly disclosed the names of 700 of its overseas factories (after years of criticism of its overseas labour practices), as part of its corporate responsibility report.

Nike is also trying its hand at ‘open innovation’, as one of the movers and shakers behind the GreenXchange (GX) initiative. This is a web-based marketplace where companies can ‘collaborate and share intellectual property and patents, which can lead to new sustainability business models and innovation’, and Nike is committed to placing more than 400 of its patents on GX for research. ‘This demonstrates our belief that the best way to stimulate sustainable innovation is through open innovation,’ says Mark Parker, Nike president and CEO. ‘Our hope is this will unleash new innovation to help solve current obstacles to sustainability issues.’ So let’s not be too cynical about who the main beneficiaries of this openness will be.

The notion of making intellectual property and patents freely available has already proved its worth in the software industry, as unpaid public volunteers have been behind the development of Open Source and numerous free smartphone apps (let’s not be too cynical about who the main beneficiaries of this openness have been). Whether this approach can be put to equally effective use in areas such as trainers and rubber, remains to be seen. Involving lots of (non-employees) in research and development could deliver results more quickly than the closed alternative; it certainly gives Nike access to a huge user base, and the potential to speed up proof of concept, refine product development, and generate sales. 

Capitalism has been going through a rather sticky period of late, but the commercial imperative survives in tact, even in something as altruistic as the open data movement, and this raises some thorny issues, particularly in relation to the privacy and ethics of using personal data. Although a great deal of open data is not personal, not all publishers of open data mean exactly the same thing when they refer to data as being ‘non-personal’. At data.gov.uk, for instance, you will find the names, job titles, and salaries of senior civil servants; in other countries the publication of ‘personal data’ goes much further. 

In Nordic countries personal tax data is widely regarded as ‘non-personal’. The Finnish tax administration makes taxpayers income and capital gains taxes annually available, which has led to its use by businesses as as the basis for paid services. The commercial organisation Verporssi, for example, even has a text message service to help you access your fellow-citizens tax data as quickly as possible. UK residents are already aware of what happens when you make public data such as the Electoral Register available, and whilst I am not suggesting that the government has plans to sell more of the open data it is currently making freely available, at some point it may. 

Even data that has been de-personalised can raise privacy issues. At a recent ICAEW lecture on open data, Jeremy Boss, chief information officer at the Audit Commission expressed a personal view on this: ‘The first time we send out an anonymised data set, and somebody figures out that by comparing two different sets of data you can identify somebody and tie them to their medical records, then we will have trouble.’ Which prompted Richard Thomas CBE, a past UK Information Commissioner who is now global strategy advisor to the Centre for Information Policy Leadership at Hunton & Williams LLP, to explain that this is already possible. 

In a world where increasing amounts of data are available and increasingly sophisticated tools are available to analyse it, non-personal data can reveal some very private information. ‘You do not always know what is personal and what is not,’ warns Thomas. ‘Take abortion. It is possible to look at publicly available data, and figure out, by name, which girls under 16 have had an abortion. The same is true of people with certain illnesses,’ he adds, and who knows what use that sort of data could be put to – which brings us back to where we started. Open data is potentially a wonderful thing, but the road to Hell is paved with good intentions.

OTHER NEWS

SECTORS

CATEGORIES