Links
📙

3. Create Metadata and Data Dictionary

Metadata is data about data. Metadata describes the dataset’s structure, data elements, its creation, access, format, and content. A data dictionary is a type of metadata that focuses on the data elements.
Metadata is necessary to improve the discoverability of data within the open data portal and on external search engines. The more relevant information the search engine has about your data resources, the easier it will be for users to find.
Without good metadata, datasets are prone to getting lost. Below we define minimum standards and best practices for:
📂
Resource reminder!
Use the metadata template started in Step 1 to document according to this guide.

Create your metadata

Metadata checklist

📣
Want to provide feedback on future metadata guidance?
We want to develop additional human-centered practice guides for metadata. Fill out the form to indicate interest.

Best practices: dataset title content

Do's
Dont's
Keep titles concise and informative.
Avoid using CA or California in the title if it does not meaningfully clarify the scope.
Avoid using jargon and spell out acronyms.
Avoid placing dates or years in your dataset title (e.g. 2016-2021). Instead make sure your data includes relevant date information as fields. Describe any useful limitations on observed dates in your dataset description instead of title.

Best practices: dataset description content

Do's
Dont's
Create a summary paragraph that details the contents of your data table. The first few sentences are the most important.
Include purpose of dataset including the programs or polices the data supports.
Include related legislation if applicable (especially if it defines the method and/or attributes of collection).
Include data collection method and source (not the name of the database, but from what process, people, or organizations does the data come).
Include relevant acronyms, but make sure to clearly define them at least once.
Highlight common questions or important notes about the dataset like limitations, missing periods of time, etc.
If your description is long, consider linking to a more detailed document and summarizing the key points in your description.
Avoid using acronyms in your first few sentences without definition.
Avoid naming just the database the data comes from. Instead highlight the process and methods for collecting the data.
Stuck on best practices, need advice?
Send a message on the CalData Communities Open Data Channel. If you need access, sign up here.

Create your data dictionary

A data dictionary is the information you provide that defines the fields in your data and how the data can be used.

Data dictionary checklist

📂
Additional Resources. Refer to Data Dictionary: What to Include for further guidance on what to include in the data dictionary.

📃
Best practices: field definitions

Do's
Dont's
Be precise, unambiguous, and concise.
Include relevant acronyms, but make sure to clearly define them at least once.
If the value is a date, document the time zone of the recording, e.g. PDT (Pacific Daylight Time).
If the values are calculated, the source of raw data and calculation method should be included.
Include units of measurement if applicable.
Include any known limitations of the data collected, e.g. groundwater levels were not measured in the month of January.
If the field is a category, include the list of allowable values.
Avoid writing these definitions from the perspective of an expert; write with the novice user in mind.