The What and Why of Open Data

The Open Knowledge Foundation (OKF) has a standard definition of open in both short and detailed form. Below is the short definition offered by OKF, which we further define specific to California standards.

Open data is data that can be freely used, shared and built-on by anyone, anywhere, for any purpose.

To accomplish this, open data:

  1. Is released in the Public Domain. Data must be in the Public Domain and provided at no cost to users.

  2. Is accessible and discoverable. Data must be published to an official State open data portal without restrictions. Any additional information necessary for attribution or citation must also accompany the data.

  3. Is published with timely updates. Data must be published in a manner to minimize time between the creation and dissemination of the data.

  4. Is machine readable. Data must be provided in a form readily processable by a computer and where the individual elements of the work can be easily accessed and modified.

  5. Is in an open format. Data must be provided in an open format. An open format is one which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool. For example, the most common and usable open formats for tabular data are: CSV and JSON.

There are many things the State does to share data or reports about data. You can consider these "data products," but they are not open data by the definition above.

The table below describes several data products that are sometimes confused with open data, reason why they aren't, and ways "upgrade" to open data.

Why open data?

Open data is not just something we do for the sake of open data. There are real benefits including:

  • Stimulating new ideas and services. By releasing open data, State organizations may help to stimulate new and innovative ideas from Californians. There is great potential for open data to act as the fuel for new solutions and even new businesses that can address common problems or challenges facing those that live in, work in, or travel to the State of California. For example, see projects developed as part of the California Water Data Challenge.

  • Increasing cross-organizational data sharing. If data can be shared in the open, you can leverage the open data portal as an interface to data between departments and agencies and other external organizations. This can also save from additional costly investments in data infrastructure. Combining information from different State departments and agencies can also provide valuable insights into important areas that many organizations touch including health equity, climate change, and drought response to name just a few.

  • Simplifying Public Records Act (PRA) Requests. Open data releases can be an effective way of responding to requests for data made under the Public Records Act. One open data release may address multiple requests for information that can be repetitive and costly to respond to if addressed on an individual basis.

  • Improving data quality. Having more eyes on data helps improve the quality over time. Open data publishing allows and encourages users to provide feedback on accuracy, consistency, and other quality measures, important feedback that can help departments get better results from their own internal data uses.

  • Reducing unwanted web traffic. Publishing open data can also help reduce unwanted web traffic on department and Agency websites, which is often the result of β€œdata scraping” by individuals seeking to obtain data in bulk from the State through public applications. This puts unnecessary stress on the State's technology infrastructure and unneeded burden on IT staff.

  • Changing how we use data. Ultimately, open data can serve as a platform to change how we use, share, and consume our data externally and internally, transform data into services, and foster continuous improvement in decision making and the business of government. Ultimately, open data is about enabling use of data to help support a range of positive outcomes.

Last updated