The University of Pittsburgh has licensed a cloud-based Electronic Research Notebook, LabArchives, since 2016. LabArchives research notebooks assist with the organization and management of laboratory data, safely and conveniently across multiple platforms and devices. Whether managing a research lab as a principal investigator or reviewing students’ lab work as an instructor, LabArchives supports effective research data management plans and helps improve student learning. Pitt researchers seeking to make the transition from paper-based to electronic lab notebooks can watch YouTube videos, read our guide, or attend one of our training sessions.
LabArchives has expanded beyond electronic research notebooks for Research and Education to include two products that we are excited to announce are now available to researchers with a Pitt email address: Inventory and Scheduler.
LabArchives Inventory streamlines the organization, tracking, and ordering of lab inventory. Whether you need to order inventory from a vendor or manage your in-lab created materials, LabArchives Inventory provides a simple and customizable solution for your physical inventory management needs. Use Inventory to customize your inventory types and storage locations, add and manage lab inventory items, and then use the ordering options to request and receive materials. Continue reading
The HSLS Data Services team is thrilled that Pitt has declared 2021-22 to be the Year of Data and Society, because for us, every day is a day for data. Whether you are embarking on your first research project or have dozens of completed studies under your belt, we are here to help you improve the efficiency and reliability of your data-handling workflows at every step in the research process. We offer consultations, classes, and customized trainings in the following areas:
Research data management
Organizing files, writing documentation, and safely storing datasets are key practices for working with data effectively. They are also required discussion items for data management plans, which will be mandated in all NIH grant applications after January 2023. (Read the official NIH notice.) We recommend our Introduction to Research Data Management workshops especially for new graduate students to set themselves up with good habits from the start, but in-depth consultations are available for any lab, research group, or individual. Continue reading
Do you work with human genetic variants? Have you sought out relevant publications, clinically significant evidence, and/or publicly available data? Are you ready to contribute to the scientific and patient-care community by sharing your own research output?
You likely already know about and use ClinVar, the go-to resource for the clinical genetics community that aggregates information about genomic variation and its relationship to human health. ClinVar recently reached the significant milestone of including 1 million unique variants in its database. Over 1,800 organizations from 82 countries have submitted almost 1.5 million records in ClinVar, including more than 11,000 curated variants from 14 expert panels.
Now it is easier than ever to reciprocate and be a supportive community member by submitting your human genetic variant data using the new ClinVar Submission API. The workflow for submissions is fast and automated, thanks to a RESTful API—a particular architectural style for an application program interface (API) allowing two software programs to communicate with each other to access and use data. Continue reading
Typically if a researcher is asked what they think of when they hear the word “publication,” a “traditional” research journal article likely comes to mind. However, if the entire research workflow is considered, there are many research outputs that could be published including articles, preprints, protocols, datasets, and software. (We are defining “published” simply as “disseminated,” although terms such as “shared” or “posted” may be more appropriate depending on the output.)
The number of venues for publishing these outputs is growing and includes data repositories and preprint servers like DRYAD and medRxiv. New journals such as the Journal of Open Source Software (JOSS) and Scientific Data have been founded specifically to allow these research outputs to be recognized within the scholarly system. In addition, expanded publication types are now offered by established journals like PLOS ONE, which introduced Lab and Study Protocol types in early 2021.
This article will provide options for publishing research protocols, however the Where Should I Publish? Guide linked on the left of the Scholarly Communication Guide also compares options for other research outputs. Continue reading
Works in progress can become unruly. As a piece of research code grows, it often spawns new files that iterate on the original: this version fixes one bug but introduces another, or that version swaps two similar functions. The same is true for manuscript drafts which pass among co-authors, accumulating new text (and usually new filenames) as they travel. It can be difficult to tell these versions apart from each other, or trace the history of how one version evolved from another. Version control systems make this work easier.
Version control is defined as “a system that records changes to a file or set of files over time so that you can recall specific versions later” in Pro Git (second edition, 2014), an excellent open textbook by Scott Chacon and Ben Straub. Version control allows a user to see all changes made to a file, who made the changes, and when they were made. It can let an author approve or reject edits made to a manuscript, or quickly determine which set of figures are the right ones to submit to a journal. Continue reading
HSLS offers classes in a wide array of subjects—molecular biology, database searching, bibliographic management, and more! You can quickly view all Upcoming Classes and Events or sign up to receive the weekly Upcoming HSLS Classes and Workshops email.
This month’s featured workshop is Exploring and Cleaning Data with OpenRefine. The workshop will take place on Friday, June 11, 2021, from 10-11:30 a.m.
Register for this virtual workshop*
Exploring and Cleaning Data with OpenRefine is a workshop that introduces participants to the basics of working with OpenRefine to clean, organize, and transform messy datasets.
OpenRefine (formerly Google Refine) is a powerful, free, open-source tool for working with unorganized tabular data. Since OpenRefine works offline in a web browser, your private data is not uploaded to the cloud and will stay on your local computer. Note that you are always working on a copy of your data, your raw data files are kept in their original form. Another benefit of OpenRefine is that while the program has a graphical interface, the system documents steps that have been completed to allow for reproducibility in data cleaning. These steps can be saved as JSON scripts and used to automate steps to clean other similar files. Continue reading
Common Data Elements (CDEs) are definitions that allow data to be consistently captured and recorded across studies. Simply put, they allow researchers to ask the same questions in the same way across studies and receive standardized responses. For example, consider the following two questions about adolescent exercise, used on two different surveys.
Survey 1 Question:
In the past 7 days, how many days did your child exercise so much that he/she breathed hard? (Choose one)
- No days
- 1 day
- 2-3 days
- 4-5 days
- 6-7 days
Survey 2 Question:
In the past 7 days, how often did your child exercise or participate in sports activities that made them breathe hard for at least 20 minutes. (Fill in the blank)
The results from each could not be combined, as one question provides options while the other allows write-in responses, and their definition for exercise may vary (as one explicitly states 20 minutes). Continue reading
Take a moment and consider your name. Do you have a name so unique that you are the only person with that name publishing in your field? I do—there are very few Helenmarys in the world to begin with. But if you cut my first name down to “Helen,” suddenly I could be one of a dozen authors working in my area. Reduce it further to “H,” and I’ve vanished among the crowd. Uniqueness is no match for the sheer ubiquity of names in the online scholarly publishing record.
What I need is a PID: a persistent identifier that refers to me and only me, and would still refer to me if I changed my name. For names, that’s easy: I have an ORCID iD, a sixteen-digit alphanumeric string that I can connect to my research output and take with me wherever I go. But what if I were not a person but a dataset, an article, or a piece of software? All of those can get PIDs too, as can far stranger objects, the breadth of which was the focus of January’s all-online, still-available, free PIDapalooza festival. Continue reading
The week of February 8-12, 2021, is Love Data Week, an international event designed to raise awareness about research data management, sharing, preservation, and—most importantly—how we can help you. To celebrate, HSLS Data Services will be hosting a variety of workshops and giveaways, and engaging with the community via social media.
The HSLS classes offered during Love Data Week (online synchronous via Zoom) are listed below. Every class attendee will be entered into a raffle for a chance to win a gift card (mailed to winner). The more classes you attend, the more chances you have.
- Introduction to Research Data Management, February 8, 2–3 p.m.
- Data Management in R, February 9, 11 a.m.–12:30 p.m.
- Social Justice and Publicly Available Data, February 10, 10–11 a.m.
- Increase Your Data’s Discoverability with the Pitt Data Catalog, February 10, 1:30–2:30 p.m.
- Command Line Basics: Questions Hour, February 11, noon–1 p.m.
- Mapping Geographic Data with Tableau, February 12, 10–11 a.m.
Note: Zoom links will be sent upon registration (also available at the above class links). Continue reading
In October 2020, the NIH released their Final Policy for Data Management and Sharing which requires NIH-funded researchers to proactively plan for how scientific data will be preserved and shared through submission of a Data Management and Sharing Plan.
Additional supplementary information released in concert with the policy addresses:
The HSLS Update has published numerous articles about preprints over the years. Here we introduce another iteration of the preprint movement—Research Square, a multidisciplinary platform that helps researchers share their work early, gather feedback, and improve their manuscripts prior to (or in parallel with) journal submission.
So what differentiates Research Square from other preprint servers? The focus is on “added value” features such as:
A data management plan is a formal document outlining how you will handle your data both during your research and after the project is completed. While writing this plan, and most importantly while preparing your grant application, it’s important to think through the long-term costs that might be associated with managing and preserving data throughout its life-cycle and the resources needed (both physical and personnel) to do so.
A new consensus study report from the National Academies of Sciences, Engineering, and Medicine titled “Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs” may be useful to researchers trying to accomplish this task. The report provides a framework to “help researchers identify and think through the major decisions in forecasting life-cycle costs for preserving, archiving, and promoting access to biomedical data.”
In addition to the report there are many other valuable tools/guides linked under the “resources” tab on the National Academies Press page. Of particular interest are:
There are challenges with downloading genomic data. File sizes are large, and it can be time consuming to retrieve multiple files. Sometimes downloads fail. A custom script may be required. Fortunately, a solution to all of these frustrations is now available—NCBI Datasets.
This experimental resource allows users to easily download eukaryotic genome sequence and annotation data by assembly accession, taxonomic name (scientific and common), or taxonomy ID. The web interface allows for browsing by organism, with the most common experimental species conveniently available from the main page. For example, try selecting the house mouse (mus musculus), then select all 22 associated assemblies. Options for the type of data for the download include genomic, transcript, and protein sequences as well as annotation features. Continue reading
Across the diverse fields served by the Health Sciences Library System, one thing is universal: good science depends on good data. Whether you are embarking on your first research project or have dozens of completed studies under your belt, the HSLS Data Services team is here to help you improve the efficiency and reliability of your data-handling workflows at every step in the research process. We offer consultations, classes, and customized trainings on data topics including:
- Organizing and describing files and data—always an important practice, but especially critical at a time when many researchers are working in multiple locations, on distributed teams, or on multiple computers and file servers. These workshops are also recommended for new graduate students to set themselves up with good habits from the beginning.
- Writing a data management plan for funders and publishers, including pre- and post-submission review using DMPTool.