The life sciences are erupting with data. Thanks to advancements in DNA sequencing technologies and the speed and capacity of computational algorithms, the generation of vast quantities of genomic and proteomic data is now commonplace and expected. However, analysis of this data is not keeping pace with its acquisition (storage space is yet another issue…). One limiting factor is that many biomedical scientists do not yet know how to access, much less use, the available analytical resources. This article describes a platform for multi-omic data analysis that is accessible, reproducible, and transparent, and recommends resources on how to use it.
Galaxy is a community-supported platform that provides access to over 5,500 tools for a multitude of analytical needs, in categories such as variant analysis, imaging, and statistics. Its components include the Galaxy Software Framework and the Public Galaxy Service. The software framework is an open-source, web-based application that functions as an intermediary between researchers without informatics expertise and the computational infrastructure that runs and stores the analyses. The public service includes the main instance, which is an installation of the Galaxy software combined with many tools and data, as well as over 80 public servers. Some of these servers are even domain-specific (ImmPort Galaxy, focusing on flow cytometry analysis) or tool-publishing (MBAC Metabiome Server, simplifying the control, usage, access, and analysis of microbiome, metabalome, and immunome data). Local institutional instances are also possible; the University of Pittsburgh has a Galaxy server hosted by the Center for Research Computing.
The scale of Galaxy is initially a bit daunting. Fortunately, there are numerous resources to help researchers navigate the analytical possibilities. Everything to get you started is at galaxyproject.org, including Galaxy 101, dataset collections, interactive tours, and a growing collection of tutorials developed and maintained by the worldwide Galaxy community and Galaxy Training Network.
The HSLS Molecular Biology Information Service can also assist you with using Galaxy for your research. During the spring 2018 semester we are introducing two hands-on workshops that will teach the basics of Galaxy including (1) interface navigation and interaction and (2) how to create, modify, and extract workflows.
- ChIP-Seq & Galaxy (April 13)
- RNA-Seq & Galaxy (April 27)
To learn more, read the bioRxiv article on “Community-Driven Data Analysis Training for Biology” or contact the HSLS Molecular Biology Information Service.
~Carrie Iwema