New Software and 3D Model Records Now in the Pitt Data Catalog

HSLS Pitt Data Catalog, a project by the Health Sciences Library SystemWhen HSLS launched the Pitt Data Catalog last spring, we wanted to provide researchers with flexible options for advertising and sharing their data. Now that the catalog has grown to describe more than 20 Pitt-created datasets, that flexibility has led our collection development in surprising and exciting directions. We have recently added our first records describing software code and 3D models, all created by Charles C. Horn, PhD.

Dr. Horn is an associate professor of medicine who studies gut-brain communication, particularly via the vagus nerve. His research makes use of several open-source software packages, which he demonstrates in his paper (with David M. Rosenberg), “Neurophysiological Analytics for All! Free Open-Source Software Tools for Documenting, Analyzing, Visualizing, and Sharing Using Electronic Notebooks.” Electrophysiological data used to demonstrate the software tools are available in the publication’s data supplements and on Github, where Dr. Horn has also uploaded scripts and a Docker image containing tools to make neurophysiological data analysis easier. Pitt Data Catalog records linking to those software/data packages include:

Dr. Horn has also designed several printable 3D models for experimental apparatuses in electrophysiology. The files shared through the NIH 3D Print Exchange include printable files in a variety of formats, photos, and assembly instructions. The 3D model records in the Pitt Data Catalog are:

We are pleased to host records describing these software packages and models, which are the first of their kind in the wider Data Catalog Collaboration Project.

If you have data, code, or models (printable or otherwise) that you would like to include in the Pitt Data Catalog, please contact us at HSLSDATA@pitt.edu or through the “Include your Dataset” button on the Pitt Data Catalog homepage. We are available to talk with you about publicizing your research products through the catalog. The process is quick, free, and tailored to your needs, especially regarding confidentiality and controlling access to your data.

~Helenmary Sheridan

Data Catalog Collaboration Project Wins a Distinguished Award

HSLS Pitt Data Catalog, a project by the Health Sciences Library SystemHSLS, along with academic health sciences libraries at NYU Langone Health, Duke University, University of North Carolina at Chapel Hill, Hofstra University, University of Maryland at Baltimore, University of Virginia, and Wayne State University, participates in the Data Catalog Collaboration Project (DCCP). The DCCP recently received an award from the Clinical and Translational Science Awards (CTSA) Great Team Science Contest. “One of the goals of the CTSAs is to promote team science through establishing mechanisms by which biomedical researchers can collaborate, be trained in why team science is important, and develop evaluation measures to assess teamwork in biomedical research contexts.” “One hundred seventy applications were submitted, and the DCCP received the highest score for the Top Importance category.”

As a participant in the DCCP, HSLS developed the Pitt Data Catalog, a tool that provides Pitt researchers with an easy way to make their datasets discoverable as well as to identify other usable data.

Congratulations to the DCCP!

Inclusion of Pitt Data Catalog Datasets in Google’s New Dataset Search

Researchers across disciplines are sharing their data more and more, whether because of journal or funder mandates, or simply because they personally prefer the openness to increase discoverability and reuse of their data. This sharing has resulted in millions of datasets described or deposited in various locations across the web, including general or discipline-specific data repositories, publisher sites, data journals, authors’ home pages, or institutional data catalogs such as the Pitt Data Catalog (for more information see the catalog’s about page).

In early September 2018, Google launched a beta dataset search to enable users to find datasets, no matter their location, through a familiar interface and simple keyword search.

Because the Pitt Data Catalog uses structured data to describe the data included, records from the catalog are retrieved in Google’s search (as shown below), increasing the visibility of research and potentially the number of views and citations of associated publications.

Pitt Data Catalog record Eye movementsGoogle Dataset results showing Eye Movements

If you have datasets you would like to have described in the Pitt Data Catalog, please contact the HSLS Data Services team at HSLSDATA@pitt.edu or through our dataset inclusion form.

~Melissa Ratajeski

Your Input Needed: Proposed Provisions for a Future Draft NIH Data Management and Sharing Policy

The National Institutes of Health (NIH) is implementing measures to update its 2003 Data Sharing Policy, issuing a Request for Information (RFI) to solicit public input on proposed key provisions that could serve as the foundation for a future NIH policy for data management and sharing.

These provisions include:

  • Definitions related to data management and sharing;
  • A stated purpose to manage, preserve, and make scientific data accessible in a timely manner for appropriate use by the research community and the broader public;
  • The scope and requirements for all intramural and extramural research, funded or supported in whole or in part by NIH, that results in scientific data, regardless of NIH funding level or mechanism;
  • Proposed elements to be addressed in a data management and sharing plan: data types, related tools, software and/or code, data standards, data preservation and access (including timelines), terms for re-use and redistribution, limitations on access, and responsible personnel for data management oversight; and
  • An NIH compliance and enforcement plan that would include review at minimum annually and non-compliance taken into account for future funding or support decisions.

Comments on the proposed key provisions will be accepted electronically through December 10, 2018. Continue reading “Your Input Needed: Proposed Provisions for a Future Draft NIH Data Management and Sharing Policy”

New HSLS Program—Spotlight Series: Software Developed @ Pitt

The HSLS MolBio Information Service and Data Services have collaborated in the creation of a new HSLS program—Spotlight Series: Software Developed @ Pitt—that focuses on software developed by Pitt health sciences researchers.  Sessions will begin with a 30-minute presentation of tool development and use cases, followed by instruction on software access/installation, discussion of parameters, and hands-on practice.

The first session in this series will be:

FRED: A Versatile Framework for Modeling Infectious Diseases and Other Health Conditions

Thursday, September 20, 2018, 2:00 p.m. to 4:00 p.m.

Instructor: David Sinclair, PhD, Postdoctoral Researcher, Public Health Dynamics Lab

Location: Scaife Hall, Falk Library, Upper Floor Study Area

Please register and bring your own laptop.

If you would like to present your software or have a suggestion of a software that we should spotlight please contact: HSLSDATA@pitt.edu.

~Melissa Ratajeski

Data Sharing Statement Policy for Clinical Trials Enacted July 2018

The International Committee of Medical Journal Editors (ICMJE) is a working group of medical journal editors that makes recommendations for the conduct, reporting, editing, and publication of scholarly work in medical journals. Journals that state they follow the ICMJE recommendations include: Academic Medicine, American Journal of Epidemiology, Cancer Nursing, Chest, Circulation, Immunology & Cell Biology, Journal of Dental Hygiene, and Radiology. 

These recommendations cover a range of topics, including:

  • defining the roles of authors
  • conflicts of interest
  • corrections, retractions, republications and version control, and copyrights
  • advertising
  • clinical trials

As of July 1, 2018, manuscripts submitted to ICMJE journals reporting on the results of clinical trials must include a data sharing statement. Data sharing statements must indicate the following:

  • if individual de-identified participant data will be shared;
  • details of the data that will be shared (inclusion of data dictionaries, study protocol, statistical analysis plan, etc.);
  • when the data will become available and for how long; and
  • by what access criteria data will be shared (including with whom, for what types of analyses, and by what mechanism).

Examples of such statements are available on the ICMJE Website. As noted by Pitt’s Research Conduct and Compliance Office:

“If you have provided information in the Individual Participant Data (IPD) Sharing Statement module of your ClinicalTrials.gov study record, you should ensure that this information matches the data sharing statement submitted with the manuscript. Questions should be directed to the journal to which you are submitting.”

Clinical trials that begin enrolling participants on or after January 1, 2019, must include a data sharing plan in the trial’s registration.

Members of HSLS Data Services are available for consult when writing your data sharing statement.

~Melissa Ratajeski

Tracking down Datasets Using PubMed and PMC

PubMed and PubMed Central (PMC) now offer filters to limit a search to only those articles or citations that include related data links, supplemental material, data citations, or a data availability or data accessibility statement.

The filters, detailed below, can be combined with any search by simply adding the Boolean operator “AND” and the specific filter into the search box (see the screenshots below for example syntax; the filters are highlighted in yellow).

PubMed

data[filter] in PubMed search box

Use data[filter] to find citations with related data links in either the Secondary Source ID field or the LinkOut – more resources field (both located below the abstract). Continue reading “Tracking down Datasets Using PubMed and PMC”

Introducing the Pitt Data Catalog for Dataset Sharing and Discovery

Pitt Data Catalog, a project by the Health Sciences Library SystemSharing research data can bring many benefits, including greater visibility for data creators, a more transparent research process, and opportunities to identify potential collaborators. But what about datasets that are stored on a lab server instead of in a data repository, or that should only be shared with vetted researchers? The Pitt Data Catalog is a new platform at HSLS designed to help Pitt health sciences researchers share and discover their otherwise hard-to-find datasets, while keeping ultimate control over the data in researchers’ hands.

“The Pitt data catalog has the potential to improve research collaborations and accelerate the impact of research being conducted in the schools of the health sciences. I strongly encourage each researcher to work with HSLS to make your datasets discoverable through the catalog in accordance with the FAIR Data Principles: Making Data Findable, Accessible, Interoperable and Reusable.” Dr. Arthur Levine, Senior Vice Chancellor for the Health Sciences

Unlike data repositories like Dryad or Zenodo, the Pitt Data Catalog does not host any data files. Instead, each dataset included in the catalog is described in a metadata record that includes information about the dataset’s authors, subject domain, and data creation process, as well as instructions for accessing the dataset itself and links to associated publications. Some data catalog entries describe publicly-available datasets, so their records link directly to the data in a repository. Other entries that describe privately-held datasets may direct a visitor to e-mail the corresponding author, or link to a data-access application form. Each record is created in collaboration with the researcher to ensure accurate and comprehensive information.

If you have datasets you would like to have described in the Pitt Data Catalog, please contact the HSLS Data Services team at HSLSDATA@pitt.edu or through our dataset inclusion form. We’ll schedule an in-person or phone consultation to learn more about your datasets and discuss the most appropriate terminology to describe your data. After we create a draft of your dataset’s record, we’ll send it to you for final approval. If you have updates after the record is published, just contact us to make changes; we may also contact you to make sure our information is still current.

HSLS Data Services staff are happy to give demonstrations for individual health sciences researchers, departments, or labs. If you would like to investigate whether the Pitt Data Catalog would be a good match for your datasets, please reach out and we will gladly explore its possibilities with you.

The University of Pittsburgh, Health Sciences Library System, is a member of the Data Catalog Collaboration Project and has customized this data discovery tool in part with Federal funds from the National Library of Medicine, National Institutes of Health, Department of Health and Human Services, under cooperative agreement number UG4LM012342 with the University of Pittsburgh, Health Sciences Library System. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

~Helenmary Sheridan

Expand Your Data Analysis Universe with Galaxy

Galaxy logoThe life sciences are erupting with data. Thanks to advancements in DNA sequencing technologies and the speed and capacity of computational algorithms, the generation of vast quantities of genomic and proteomic data is now commonplace and expected. However, analysis of this data is not keeping pace with its acquisition (storage space is yet another issue…). One limiting factor is that many biomedical scientists do not yet know how to access, much less use, the available analytical resources. This article describes a platform for multi-omic data analysis that is accessible, reproducible, and transparent, and recommends resources on how to use it.

Galaxy is a community-supported platform that provides access to over 5,500 tools for a multitude of analytical needs, in categories such as variant analysis, imaging, and statistics. Its components include the Galaxy Software Framework and the Public Galaxy Service. The software framework is an open-source, web-based application that functions as an intermediary between researchers without informatics expertise and the computational infrastructure that runs and stores the analyses. The public service includes the main instance, which is an installation of the Galaxy software combined with many tools and data, as well as over 80 public servers. Some of these servers are even domain-specific (ImmPort Galaxy, focusing on flow cytometry analysis) or tool-publishing (MBAC Metabiome Server, simplifying the control, usage, access, and analysis of microbiome, metabalome, and immunome data). Local institutional instances are also possible; the University of Pittsburgh has a Galaxy server hosted by the Center for Research Computing.

The scale of Galaxy is initially a bit daunting. Fortunately, there are numerous resources to help researchers navigate the analytical possibilities. Everything to get you started is at galaxyproject.org, including Galaxy 101, dataset collections, interactive tours, and a growing collection of tutorials developed and maintained by the worldwide Galaxy community and Galaxy Training Network.

The HSLS Molecular Biology Information Service can also assist you with using Galaxy for your research. During the spring 2018 semester we are introducing two hands-on workshops that will teach the basics of Galaxy including (1) interface navigation and interaction and (2) how to create, modify, and extract workflows.

To learn more, read the bioRxiv article on “Community-Driven Data Analysis Training for Biology” or contact the HSLS Molecular Biology Information Service.

~Carrie Iwema

NEW Data Class Offerings

In our continuous effort to support your research needs, HSLS is offering four new classes this spring covering: (1) introduction to mapping, (2) Python through Jupyter, (3) beginning command line for bioinformatics, and (4) options for bioinformatics analysis. Class descriptions and registration links are listed below.

(1) Data 101: Introduction to Mapping 

Thursday, February 15, 2018, 11 a.m. – 1 p.m.; Registration required

Mapping is a great way to visualize and analyze information—and to tell stories. In this introductory workshop, you’ll learn the principles of mapmaking, understand how computers are used to plot addresses on a map, conduct basic spatial analysis, and update records in a database based on location. Along with a deeper appreciation for computers, this class will provide you with a solid foundation of mapping concepts and processes, and get you prepared to take your first computer-based mapping class. No computers will be used in this class. Continue reading “NEW Data Class Offerings”