California Open Data Publisher's Handbook
Documents and ResourcesCA Open Data Portal
  • Introduction
  • ☑️1. Review the Pre-Publishing Checklist
  • 📈2. Prepare Data for Publishing
  • 📙3. Create Metadata and Data Dictionary
  • 🔼4. Upload the Dataset
  • 👍5. Get Final Publishing Approval
  • 🔄6. Update and Maintain the Dataset
  • 📣Feedback & Help
  • Reference
    • The What and Why of Open Data
    • Open Data Portals Managed by State Entities
    • Data Preparation and Formatting Guidance
      • Column Headers and Order
      • Date and Time
      • Text
      • Numeric
      • Addresses
    • Metadata Field Definitions
    • Data Dictionary: What to Include
    • Detailed Steps for Uploading Data to the Portal
    • Email Templates
    • Glossary
    • Acknowledgements
    • Version and Changelog
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
Export as PDF
  1. Reference
  2. Data Preparation and Formatting Guidance

Text

PreviousDate and TimeNextNumeric

Last updated 1 year ago

Was this helpful?

  • UTF-8 encoding should be used

    • This ensures that special characters can be decoded by users

  • No line breaks within cells

    • This can break parsing in software like Excel, introducing data integrity issues

    • There are many ways to remove and detect line breaks, but this can vary based on how you're extracting data

Character case

Text should be presented in the easiest to interpret/read format where appropriate.

Title case

  • Address String

  • Categories when either the source system presents them this way or it is easy to interpret from the source consistently

Upper case

  • Acronyms - e.g - PSA (Park Service Area)

  • States - e.g. CA

Lower case

  • Categories when the source system presents them in caps and there's no way to interpret them to title case

  • for humans and just as useful to machines, note exceptions above

Research suggests lower case as opposed to uppercase is easier to read