The What and Why of Open Data

What is open data?

The Open Knowledge Foundation (OKF) has a standard definition of open in both short and detailed form. Below is the short definition offered by OKF, which we further define specific to California standards.
Open data is data that can be freely used, shared and built-on by anyone, anywhere, for any purpose.
To accomplish this, open data:
  1. 1.
    Is released in the Public Domain. Data must be in the Public Domain and provided at no cost to users.
  2. 2.
    Is accessible and discoverable. Data must be published to an official State open data portal without restrictions. Any additional information necessary for attribution or citation must also accompany the data.
  3. 3.
    Is published with timely updates. Data must be published in a manner to minimize time between the creation and dissemination of the data.
  4. 4.
    Is machine readable. Data must be provided in a form readily processable by a computer and where the individual elements of the work can be easily accessed and modified.
  5. 5.
    Is in an open format. Data must be provided in an open format. An open format is one which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool. For example, the most common and usable open formats for tabular data are: CSV and JSON.

What is NOT open data?

There are many things the State does to share data or reports about data. You can consider these "data products," but they are not open data by the definition above.
The table below describes several data products that are sometimes confused with open data, reason why they aren't, and ways "upgrade" to open data.
Data product and description
Why this is not open data
How to upgrade to open data
Web application. A public-facing application that allows users to search for specific data and possibly generate reports
Released in the public domain
Accessible and discoverable
Published with timely updates
Machine readable
In an open format
Develop an automated process from your backend system to extract the raw data in the application and load to the open data portal. Once there, users can access as a single download or through an Application Programming Interface. Provide a link from your application to enable discovery of raw and bulk data, which will take burden off of your application. You can also link to your application from the published open data.
Dashboard. An interactive application that allows users to visualize data in pre-created reports
Released in the Public Domain
Accessible and discoverable
Published with timely updates
Machine readable
In an open format
Provide the underlying data in raw and bulk forms through the open data portal. Provide a link to your dashboard from the open data portal and to the published open data from your dashboard. This enables discovery of your resources.
Report. A document providing both data and context often published as a PDF and to satisfy an administrative or legislated requirement
Released in the Public Domain
Accessible and discoverable
Published with timely updates
Machine readable
In an open format: if PDF
Publish the data behind the report on the open data portal. If the report is based on administrative data that is collected more regularly than the reporting period, publish the underlying data on a more frequent and automated basis. Provide a link in your report to the published data and link to reports from your published data enable discovery of your resources.

Why open data?

Open data is not just something we do for the sake of open data. There are real benefits including:
  • Stimulating new ideas and services. By releasing open data, State organizations may help to stimulate new and innovative ideas from Californians. There is great potential for open data to act as the fuel for new solutions and even new businesses that can address common problems or challenges facing those that live in, work in, or travel to the State of California. For example, see projects developed as part of the California Water Data Challenge.
  • Increasing cross-organizational data sharing. If data can be shared in the open, you can leverage the open data portal as an interface to data between departments and agencies and other external organizations. This can also save from additional costly investments in data infrastructure. Combining information from different State departments and agencies can also provide valuable insights into important areas that many organizations touch including health equity, climate change, and drought response to name just a few.
  • Simplifying Public Records Act (PRA) Requests. Open data releases can be an effective way of responding to requests for data made under the Public Records Act. One open data release may address multiple requests for information that can be repetitive and costly to respond to if addressed on an individual basis.
  • Improving data quality. Having more eyes on data helps improve the quality over time. Open data publishing allows and encourages users to provide feedback on accuracy, consistency, and other quality measures, important feedback that can help departments get better results from their own internal data uses.
  • Reducing unwanted web traffic. Publishing open data can also help reduce unwanted web traffic on department and Agency websites, which is often the result of “data scraping” by individuals seeking to obtain data in bulk from the State through public applications. This puts unnecessary stress on the State's technology infrastructure and unneeded burden on IT staff.
  • Changing how we use data. Ultimately, open data can serve as a platform to change how we use, share, and consume our data externally and internally, transform data into services, and foster continuous improvement in decision making and the business of government. Ultimately, open data is about enabling use of data to help support a range of positive outcomes.