Links
📈

2. Prepare Data for Publishing

This section provides high level things to check when preparing your data. You will likely need to do additional quality checks that are specific to your data. Please take this section as minimum things to check.
See below to help us build out more detailed guides and references for data preparation and data quality.
​
📣
Want to provide feedback on future data prep and data quality guides?

Checklist

  • If your dataset contains Personally Identifiable Information (PII) or Personal Health Information (PHI), follow your departments de-identification guidance as you prepare data.
  • Review data preparation and formatting guidance and do your best to conform to standards
  • Create a document to capture your specific data preparation steps. It should include at least:
    • Names of relevant data sources
    • Relevant contact information for sources and update process
    • Repeatable steps for preparing the dataset
    • Link to example sources and output for reference
    • A change log table that is updated with the date of changes to your documentation if you make them
  • If merging tables from multiple sources (like counties or regions)
    • Check all expected fields are accounted for across data sources
    • Check the same number of rows exist in your merged dataset as there are in your individual tables
  • Check that data types are consistent within fields in your dataset. For example if the field is supposed to be an integer, confirm that it only contains integers
  • Save your tabular data file as a delimited file such as a comma-separated values (CSV) file
Stuck on best practices, need advice?
If you are a State employee, send a message on the CalData Communities Open Data Channel. If you need access, sign up here.