California Open Data Publisher's Handbook
Documents and ResourcesCA Open Data Portal
  • Introduction
  • ☑️1. Review the Pre-Publishing Checklist
  • 📈2. Prepare Data for Publishing
  • 📙3. Create Metadata and Data Dictionary
  • 🔼4. Upload the Dataset
  • 👍5. Get Final Publishing Approval
  • 🔄6. Update and Maintain the Dataset
  • 📣Feedback & Help
  • Reference
    • The What and Why of Open Data
    • Open Data Portals Managed by State Entities
    • Data Preparation and Formatting Guidance
      • Column Headers and Order
      • Date and Time
      • Text
      • Numeric
      • Addresses
    • Metadata Field Definitions
    • Data Dictionary: What to Include
    • Detailed Steps for Uploading Data to the Portal
    • Email Templates
    • Glossary
    • Acknowledgements
    • Version and Changelog
Powered by GitBook
On this page
  • Create your metadata
  • Metadata checklist
  • Best practices: dataset title content
  • Best practices: dataset description content
  • Create your data dictionary
  • Data dictionary checklist
  • Best practices: field definitions

Was this helpful?

Edit on GitHub
Export as PDF

3. Create Metadata and Data Dictionary

Previous2. Prepare Data for PublishingNext4. Upload the Dataset

Last updated 1 year ago

Was this helpful?

Metadata is data about data. Metadata describes the dataset’s structure, data elements, its creation, access, format, and content. A data dictionary is a type of metadata that focuses on the data elements.

Metadata is necessary to improve the discoverability of data within the open data portal and on external search engines. The more relevant information the search engine has about your data resources, the easier it will be for users to find.

Without good metadata, datasets are prone to getting lost. Below we define minimum standards and best practices for:

Resource reminder!

Use the metadata template to document according to this guide.

Create your metadata

Metadata checklist

Best practices: dataset title content

Do's

Dont's

Best practices: dataset description content

Do's

Dont's

Create your data dictionary

A data dictionary is the information you provide that defines the fields in your data and how the data can be used.

Data dictionary checklist

Do's

Dont's

Keep titles concise and informative.

Avoid using CA or California in the title if it does not meaningfully clarify the scope.

Avoid using jargon and spell out acronyms. Avoid placing dates or years in your dataset title (e.g. 2016-2021). Instead make sure your data includes relevant date information as fields. Describe any useful limitations on observed dates in your dataset description instead of title.

Create a summary paragraph that details the contents of your data table. The first few sentences are the most important.

Include purpose of dataset including the programs or polices the data supports.

Include related legislation if applicable (especially if it defines the method and/or attributes of collection).

Include data collection method and source (not the name of the database, but from what process, people, or organizations does the data come).

Include relevant acronyms, but make sure to clearly define them at least once. Highlight common questions or important notes about the dataset like limitations, missing periods of time, etc. If your description is long, consider linking to a more detailed document and summarizing the key points in your description.

Avoid using acronyms in your first few sentences without definition. Avoid naming just the database the data comes from. Instead highlight the process and methods for collecting the data.

For each field, document the field name, field label, data type, definition and valid values if applicable -

Write field definitions in user friendly language -

Additional Resources. Refer to for further guidance on what to include in the data dictionary.

Best practices: field definitions

Be precise, unambiguous, and concise.

Include relevant acronyms, but make sure to clearly define them at least once.

If the value is a date, document the time zone of the recording, e.g. PDT (Pacific Daylight Time).

If the values are calculated, the source of raw data and calculation method should be included.

Include units of measurement if applicable.

Include any known limitations of the data collected, e.g. groundwater levels were not measured in the month of January.

If the field is a category, include the list of allowable values.

Avoid writing these definitions from the perspective of an expert; write with the novice user in mind.

📃
see detailed reference on these elements
📂
Data Dictionary: What to Include
📂
started in Step 1
see metadata field definition reference
Creating metadata
Creating a data dictionary
see best practices below
see best practices below
see best practices below
✅
❌
❌
❌
✅
✅
✅
✅
✅
✅
✅
❌
❌
✅
✅
✅
✅
✅
✅
✅
❌
📙
Page cover image