I recently attended a workshop from the Data Curation Network, a collaboration of institutions that have developed specific guidelines to help their researchers share research data. Though the workshop was aimed at librarians, the DCN’s process is useful to any researcher preparing data for sharing in a repository. If you are interested in making your research more reproducible, I encourage you to consider these simple steps.
Imagine that you have a dataset—a package of data files, documentation such as codebooks or READMEs, and perhaps analysis code—that you wish to (or are required to) deposit in a repository such as Figshare or OpenNeuro. The files you have probably require some cleanup before you share them with the world, but there may be other actions you can take that would have a big usability payoff for minimal investment. The steps below form the Data Curation Network’s “CURATE” model, paraphrased here but available in full online: Data Curation Network: A Cross-Institutional Staffing Model for Curating Research Data.
C for CHECK: Take a moment to look over the files you have packaged for deposit. Is everything that you meant to include there? Can you open every file, or is something corrupted or missing a dependency?
U for UNDERSTAND: Is there sufficient information for a colleague to use your dataset? Do you have a README file with your contact information tied to a persistent identifier like an ORCID ID?
R for REQUEST: Do you need to request missing information from anyone else who worked on the project?
A for AUGMENT: Could your code’s comments be clearer? Your filenames easier to understand? What about the metadata that accompanies your submission?
T for TRANSFORM: Many of the software tools and file formats we use aren’t universally accessible. When possible, consider saving files in an open format, like using .txt files instead of Microsoft’s proprietary .docx.
E for EVALUATE FOR FAIR-NESS: The FAIR guidelines aim to make data findable, accessible, interoperable, and reusable. How well does your dataset now measure up?
If you are preparing data for deposit and would like to go through the steps above with a data librarian, the Data Services Team at the Health Sciences Library System is happy to help. Please contact us at hslsdata@pitt.edu with a description of your needs and we will set up a consultation at your convenience.
~Helenmary Sheridan