1. Review the Pre-Publishing Checklist
It takes a team to deliver high quality open data. There are 3 roles that will help you ensure publishing moves forward. They are pictured below and described in the checklist following. In some cases a single person may fulfill multiple roles, but you shouldn't be on your own entirely.
The Data Steward is the person most knowledgeable about the data including the sources, collection methods, and limitations.
The Data Coordinator acts as a liaison between internal Information Technology staff, organizational programs and leadership, and portal managers.
The Data Custodian is the person most knowledgeable about how the data is stored and protected and have technical knowledge on how to query and extract data.
They prepare data for publishing on the portal and work with Data Custodians for any system access needs and work with the Data Coordinator for publishing approval.
They are best positioned to convey to the appropriate parties any specific needs of the open data portal and program. They are trusted partners in open data within their organization.
They advise and help with data access and navigate technical options for automation.
- Data Steward. Has knowledge about how data is collected, for what purpose, and any limitations of use.
- Data Coordinator. Acts as the liaison among teams for moving open data forward.
- Data Custodian. Understands how the data is stored, secured, and accessed.
Others may be needed as you move through the process
These three roles are core, but you may need to bring in others as you move through publishing and approval. Legal and communications staff, for example, may be needed in your process. You can rely on your data coordinator to advise on who else to involve.
- Send a link to this handbook to the people you've identified. Give them a heads up you'll be working on publishing open data - you can use this email template as a starting point.
- Ensure that each reviewer on your dataset has publishing rights on the State open data portal. If you don't know or need someone to be added, send an email to the open data team to verify.
Start identifying the fields you want to publish. This will help you and others clarify what you are planning to publish. Keep your documentation template (linked below) somewhere safe as you'll return to it again as you move toward publishing data.
- Download the data documentation template (Excel file) below. You can use this to begin your documentation, starting with the fields you're publishing.
California Open Data Metadata Template.xlsx
- Start by filling in the field names and labels in the Data Dictionary template tab. You will eventually need to provide definitions as well, but at the beginning you can keep things "good enough" for you and your team to create shared understanding of the data being published.
Hold on to your data documentation template, you'll need it later
By the time you publish, you should create meaningful definitions, which is covered in step 3 Create Metadata and Data Dictionary. You'll use this same template in that section. You will eventually copy certain elements over when you upload your dataset on the portal in section 4 Upload the Dataset.
If your dataset needs to be published on a regular schedule, it's good to start thinking of what to do about that now. Even if you don't need automation, thinking about how to help others publish the data early will save you headaches later.
When data is published more frequently than quarterly, we highly recommend automation. Each organization will vary in its approach, but you can work with your IT department to investigate whether automation is possible.
If you're updating your dataset quarterly or more frequently:
- Discuss with your data custodian and other IT contacts what's possible. See this template email for a starting point.
- Work with them on an implementation plan. Remember, it'll take some time to prepare your data anyway, so starting this conversation early will help everyone be prepared.
- As you prepare your data, document your steps for producing the dataset. Going through a process to prepare can help clarify the business rules and logic that your team will need to automate. Step 2. Prepare Data for Publishing has more on this.
Consider alternate publishing strategies like initially publishing manually and then following up with automation when resources are ready.
If automation is not possible, or this is a dataset that gets updated infrequently (like once a year), you may update data manually. You'll still want to make sure the approach to publishing is well documented so you can easily cross-train others on updates. Start with the following:
- As you prepare your data, document your steps for producing the dataset. Create a data update procedure document that is easy for you and others to understand. Step 2. Prepare Data for Publishing has more on this.