Open Government Data – A virtual revolution

Archived page from Yahoo! Developer Network

Anand Joshi and I wrote this for the Yahoo! Developer Network back in 2010. Unfortunately, the post has been lost due to YDN platform changes. It can still be found on the Internet Archive: Open Government Data – A virtual revolution. I’m republishing this to avoid breaking the web.

June 4, 2010

Open Government Data – A virtual revolution

There’s a tremendous amount of government data available on the internet today. It’s an open data revolution led by the United States and the United Kingdom. This data ranges from the basic (crime, weather, finances, education) to the obscure (suicide rates, bicycle accidents). Through analysis, data can expose inefficiencies, corruption, geo-distributed social patterns, and successful policies. Data transparency flips the tables and gives citizens the tools to hold government more accountable.

Screenshot of Data.Gov

Contents

This revolution hit the main stage in the USA with the election of Barack Obama. He promised a new level in transparency, beginning with the creation of Data.Gov.

In the United Kingdom, Sir Tim Berners-Lee was tapped by Prime Minister Gordon Brown in 2009 to lead the rollout of an open government data portal, Data.Gov.UK. As Berners-Lee put it in an interview with the Telegraph, “Government data should be a public resource. By releasing it, we can unlock new ideas for delivering public services, help communities and society work better, and let talented entrepreneurs and engineers create new businesses and services.”

Linked government data

Berners-Lee has pushed for more than just placing the data online. As the leader of the Semantic Web movement, he has been organizing government data with linked-data patterns. This concept not only publishes data, but also defines what the various nodes represent and how they relate to other chunks of data. This makes it easier to create informative mashups.

The Linked Data movement uses standardized data formats to coordinate the available government data. Linked data uses Resource Description Framework (RDF) to define the data and SPARQL Protocol and RDF Query Language (SPARQL) to parse the data. RDF defines data in a subject-predicate-object pattern (triples). For instance, you might say the sky (subject) has color (predicate) that is blue (object).

SPARQL’s query format is similar to SQL and YQL. However, it lets you define the subject and predicates that are important and returns the object. Dave Beckett, an engineer at Yahoo!, has created an open table to handle SPARQL queries via YQL.

Let’s look at a Data-Gov example that parses Medicare Claims versus Interstate Migration. This mashup uses two data sources from the U.S. government: 2007-2008 State-to-State Migration Inflow and OMH Claims Listed by State. Both data sources have been released by the US Government as spreadsheets.

The RDF documents, Migration Inflow RDF and OMH Claims RDF, are hosted on the Data-Gov website and define the content of the spreadsheets. Now that the data has been defined, a SPARQL query can begin returning the desired information.

SPARQL query for Medicare claims per state


PREFIX d1623: <http://data-gov.tw.rpi.edu/vocab/p/1623/>
SELECT ?state ?claims
WHERE
    {
    GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_1623>
        {             
            ?s d1623:state ?state .
	    ?s d1623:fiscal_year_07 ?claims .
	}
     }
ORDER BY ?state
LIMIT 55

You are not limited to using SPARQL. YQL has many methods for consuming government data. There are some existing data tables, such as recent earthquakes stronger than magnitude 4.0 or nutritional values of a banana. YQL also lets you work with spreadsheets, xml documents, and other data sources.

Open government repositories

Much of the government data available today is scattered, with little coordination between organizations and different agencies or branches of government. However, there are now several official and unofficial repositories that make it much easier to find and work with data.

Official data repositories

The following repositories are maintained by the governments and provide a wide spectrum of data sources.

Unofficial data repositories

While the official data repositories are great to have, they can be frustrating to navigate and discover. Several unofficial portals have been built to facilitate working with the data.

  • Data-Gov Twiki – This twiki offers additional information about government data, how to use it, and highlights some of the lesser known data sources. Data-Gov has also created the linked-data resources for hundreds of data feeds.
  • Recovery.Org – A portal with discussions, analysis, and documentation for the Recovery Spending act. This is maintained by company that helps contractors gain government contracts.
  • TransparencyData.Com – Lobbyist and Campaign Contribution information in the United States
  • Sunlight Foundation – Transparent government data advocates.
  • FedFlix – A collection of government produced video.
  • Access Info – International resource for government data
  • Public.Resource.Org – A crowd-sourced effort to make government data and publications more open.

Open government mashups

Now you know government data is available, how it is delivered, and where you can find resources. It’s time to start working with the data. Most mashups have dealt with localities by mixing data with maps. Let’s look at some existing examples.

Using maps and government data to prove institutional discrimination

One of the most powerful examples of using open government data involves the seemingly invisible alleged discrimination against the residents of a community in Ohio. The Ohio Civil Rights Commission created a mashup of water main and ethnic distribution in the Coal Run neighborhood of Zanesville, Ohio. The results seem to prove the city had denied access to public water because of a household’s race. Without the water-main connection, residents had to have water delivered to their house via trucks.

Another organization using the powerful mixture of maps and data is the Cedar Brook Institute for Sustainable Communities. It alleges that “it is common practice for governments of small and medium-sized towns to use their powers of annexation, zoning, provision of infrastructure and public services, long-term planning, and maximization of tax base to exclude minority and low-income communities from full participation in the town’s benefits and governance.” Unfortunately, this type of mashup has proven to be so powerful that some local governments have closed off their data, under the Patriot Act, to avoid future lawsuits.

Crisis Camp volunteers create a dynamic map of Haiti for Earthquake Relief

Open Street Map of Haiti
The devastating earthquake in Haiti prompted thousands of volunteers to build tools for rescuers and relief agencies. Before the earthquake, there were few maps of the Haiti road system. However, there is an open-source project for mapping the world. The Open Street Map project became the de facto resource for navigating this country. People around the world and on the ground were able to build complex maps of streets, damage, relief camps, hospitals, and more.

Yahoo! and other companies and organizations have sponsored Crisis Camps for engineers to build tools with government data. One group of Yahoo! engineers built a series of YQL data tables that surfaced hard-to-use information on relief agencies from the United Nation’s web site.

Bicycle safety in London

In March 2009, some data on bicycle accidents in the UK was released as a spreadsheet. Within 48 hours, people had created linked data resources and the Times Online had a map available of bicycle accidents. This allowed cyclists to find the most dangerous intersections and to plan safer routes. This is an example of how quickly this data can be discovered, annotated, and mashed.

What can you do with open government data

You can make government data more accessible by working with the linked data community and/or building a YQL open table. You could also work with your local organizations to publish their data. This could be as simple as providing a spreadsheet, to building a data API.

There are many government data sources that will add a new layer of functionality to your existing applications. Here are some:

  • Give your traffic maps more context with infrastructure expenditure, accident statistics, speeding ticket locations, and even population density.
  • A real-estate website could highlight school and church distribution, neighborhood income, crime statistics, percentage of broadband usage
  • Who is supporting who? Create an application that tracks your local politicians voting records
  • Create an app that tracks a bill’s passage with news coverage, who supports/fights the bill, their contributors, past votes
  • Public.Resource.Org is trying to provide access to public legal documents to everyone. They could use more help building applications for utilizing the archives.
  • Micro-local information: What do your neighbors support politically? What are the dangers, species, expenditures, events, and crimes in your neighborhood?
  • What are you interested in? How can government data add another layer? For instance, a foodie site may want nutritional information, farm subsidy, import/export, pesticide usage, historical weather patterns, and more.

Ted Drake, Anand Joshi
Ted Drake, Yahoo! Web Developer
Anand Joshi, Intern / Backend Engineer in Finance

Posted at June 4, 2010 10:00 AM | Permalink


Posted

in

, ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *