This information is over 2 years old. Information was current at time of publication.

Make File Management Simpler with Version Control

Works in progress can become unruly. As a piece of research code grows, it often spawns new files that iterate on the original: this version fixes one bug but introduces another, or that version swaps two similar functions. The same is true for manuscript drafts which pass among co-authors, accumulating new text (and usually new filenames) as they travel. It can be difficult to tell these versions apart from each other, or trace the history of how one version evolved from another. Version control systems make this work easier.

Version control is defined as “a system that records changes to a file or set of files over time so that you can recall specific versions later” in Pro Git (second edition, 2014), an excellent open textbook by Scott Chacon and Ben Straub. Version control allows a user to see all changes made to a file, who made the changes, and when they were made. It can let an author approve or reject edits made to a manuscript, or quickly determine which set of figures are the right ones to submit to a journal.

Simple forms of version control are built into many common text and spreadsheet editors. Google Drive’s products, including Google Docs, Google Sheets, and Google Slides, record changes to a file so long as those changes are saved to the cloud. (Google’s instructions for accessing the version history of a Google Doc apply to its other programs as well.) Microsoft Word and Excel allow true versioning if files are saved to a OneDrive account, while Word’s Track Changes feature is similar to version control within a document.

Version control is most robust in tools designed for managing code projects with multiple interacting files. Git is a free and open-source version control system that runs on a local filesystem (no internet connection required) and operates via the command line. It’s quick and lightweight, and despite its optimization for code, it can track byte-level changes to any kind of file. GitHub is a cloud-based tool that puts a graphical user interface and project management tools on top of Git. It has become a popular venue to store, share, and collaborate on research code.

To get help choosing a version control system or finding examples in your field, contact HSLS Data Services and set up a personal consultation or group training.

~Helenmary Sheridan