# 3. Create Metadata and Data Dictionary

Metadata is data about data. Metadata describes the dataset’s structure, data elements, its creation, access, format, and content. A data dictionary is a type of metadata that focuses on the data elements.

Metadata is necessary to improve the discoverability of data within the open data portal and on external search engines. The more relevant information the search engine has about your data resources, the easier it will be for users to find.

**Without good metadata, datasets are prone to getting lost.** Below we define minimum standards and best practices for:

1. [Creating metadata](#create-your-metadata)
2. [Creating a data dictionary](#create-your-data-dictionary)

{% hint style="success" %}
:open\_file\_folder: **Resource reminder!**

Use the metadata template [started in Step 1](https://docs.data.ca.gov/california-open-data-publishers-handbook/1.-review-the-pre-publishing-checklist) to document according to this guide.
{% endhint %}

## Create your metadata

### **Metadata checklist**

* [ ] Fill in the metadata fields relevant to your dataset - [see metadata field definition reference](https://docs.data.ca.gov/california-open-data-publishers-handbook/reference/metadata-field-definitions)
* [ ] Make sure your dataset title is accessible and user friendly - [see best practices below](#best-practices-dataset-title-content)
* [ ] Ensure your dataset description is accessible and user friendly - [see best practices below](#best-practices-dataset-description-content)

### **Best practices: dataset title content**

| **Do's**                                                  | **Dont's**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| --------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| :white\_check\_mark: Keep titles concise and informative. | <p><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span> Avoid using CA or California in the title if it does not meaningfully clarify the scope.</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span> Avoid using jargon and spell out acronyms.<br><br><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span> Avoid placing dates or years in your dataset title (e.g. 2016-2021). Instead make sure your data includes relevant date information as fields. Describe any useful limitations on observed dates in your dataset description instead of title.</p> |

### **Best practices: dataset description content**

| **Do's**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | **Dont's**                                                                                                                                                                                                                                                                                                                                              |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Create a summary paragraph that details the contents of your data table. The first few sentences are the most important.</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Include purpose of dataset including the programs or polices the data supports.</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Include related legislation if applicable (especially if it defines the method and/or attributes of collection).</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Include data collection method and source (not the name of the database, but from what process, people, or organizations does the data come).</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Include relevant acronyms, but make sure to clearly define them at least once.<br><br><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Highlight common questions or important notes about the dataset like limitations, missing periods of time, etc.<br><br><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> If your description is long, consider linking to a more detailed document and summarizing the key points in your description.</p> | <p><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span> Avoid using acronyms in your first few sentences without definition.<br><br><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span> Avoid naming just the database the data comes from. Instead highlight the process and methods for collecting the data.</p> |

## Create your data dictionary

A data dictionary is the information you provide that defines the fields in your data and how the data can be used.

### **Data dictionary checklist**

* [ ] For each field, document the field name, field label, data type, definition and valid values if applicable - [see detailed reference on these elements](https://docs.data.ca.gov/california-open-data-publishers-handbook/reference/data-dictionary-what-to-include)
* [ ] Write field definitions in user friendly language - [see best practices below](#best-practices-field-definitions)

{% hint style="info" %}
:open\_file\_folder: **Additional Resources.** Refer to [Data Dictionary: What to Include](https://docs.data.ca.gov/california-open-data-publishers-handbook/reference/data-dictionary-what-to-include) for further guidance on what to include in the data dictionary.
{% endhint %}

### :page\_with\_curl: Best practices: field definitions

| **Do's**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | **Dont's**                                                                                                 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| <p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Be precise, unambiguous, and concise.</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Include relevant acronyms, but make sure to clearly define them at least once.</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> If the value is a date, document the time zone of the recording, e.g. PDT (Pacific Daylight Time).</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> If the values are calculated, the source of raw data and calculation method should be included.</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Include units of measurement if applicable.</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> Include any known limitations of the data collected, e.g. groundwater levels were not measured in the month of January.</p><p></p><p><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span> If the field is a category, include the list of allowable values.</p> | :x: Avoid writing these definitions from the perspective of an expert; write with the novice user in mind. |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.data.ca.gov/california-open-data-publishers-handbook/3.-create-metadata-and-data-dictionary.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
