This event has ended. Visit the official site or create your own event on Sched.
Back To Schedule
Wednesday, August 1 • 1:30pm - 4:30pm
WT11 Structural Biology: A Prototypical Case for Publishing Big Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

WT11 Structural Biology: A Prototypical Case for Publishing Big Data
Course Chairs: Gustavo Durand, Technical Lead/Architect, Dataverse, The Institute for Quantitative Social Science, Harvard University
Instructor: Gustavo DurandTechnical Lead/Architect, Dataverse, The Institute for Quantitative Social Science, Harvard University; Pete Meyers, PhD, Research Computing Specialist, SBGrid Consortium, Harvard Medical School
Course Syllabus
Description: The general public, funding agencies and researchers are increasingly recognizing the significance of publishing biomedical research data. This process can improve research efficiency, and it is essential for reproducibility and validation of scientific research.
However, the characteristics of these datasets can present problems when it comes to  publishing them in a manner that is easily usable, citable and verifiable. In particular, biomedical datasets can require significant amounts of storage; can comprise several hundred to several thousand files; and need to be accessible to automated validation pipelines at multiple storage locations. The process of depositing, publishing and verifying datasets needs to be as efficient as possible for researchers operating under time and effort constraints.
This course will explore some of the currently existing platforms, from domain-specific repositories in structural biology and how they handle data deposition and validation to more general-purpose repositories and how they have shown to be a poor fit. We will discuss the students’ own experiences and also see if these problems are shared with other domains.
The course will then explore recent enhancements made to Dataverse, a data repository framework to share and publish research data, in order to support these processes. These enhancements include depositing big datasets comprising tens to hundreds of gigabytes, ensuring the integrity of these datasets via checksum algorithms and replicating datasets close to compute resources.
The instructors will also address how design decisions made during this process could impact use in other domains. Lastly, the class will discuss what big-data repositories of the future should look like.
The course will combine lectures and hands-on work to walk participants through the process of depositing and publishing datasets with this framework. At the end of this course, each participant should have a clear understanding of the benefits and pitfalls of working with big data and direct user experience with Dataverse’s approach to addressing these challenges.

Instructor | Speaker
avatar for Gustavo Durand

Gustavo Durand

Dataverse Technical Lead, Harvard University

Pete Meyers, PhD

Research Computing Specialist, Harvard Medical School - SBGrid Consortium

Wednesday August 1, 2018 1:30pm - 4:30pm PDT