Integration of EGA secure data access into Galaxy

Youri Hoogstrate, Chao Zhang, Alexander Senf, J. Bijlard, Saskia Hiltemann, David van Enckevort, Susanna Repo, J. Heringa, Guido Jenster, Remond J.A. Fijneman, Jan-Willem Boiten, Gerrit A. Meijer, Andrew Stubbs, Jordi Rambla, Dylan Spalding, Sanne Abeln

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.
Original languageEnglish
Article number2841
JournalF1000Research
Volume5
DOIs
Publication statusPublished - 2016

Fingerprint

Galaxies
Genes
Genome
Translational Medical Research
Workflow
Servers
Biological Science Disciplines
Ecosystems
User interfaces
Medicine
Ecosystem
Research Personnel
Throughput
Research

Keywords

  • Bioinformatics
  • Data management
  • EGA
  • Galaxy
  • Translational research
  • Workflows

Cite this

Hoogstrate, Y., Zhang, C., Senf, A., Bijlard, J., Hiltemann, S., van Enckevort, D., ... Abeln, S. (2016). Integration of EGA secure data access into Galaxy. F1000Research, 5, [2841]. https://doi.org/10.12688/f1000research.10221.1
Hoogstrate, Youri ; Zhang, Chao ; Senf, Alexander ; Bijlard, J. ; Hiltemann, Saskia ; van Enckevort, David ; Repo, Susanna ; Heringa, J. ; Jenster, Guido ; Fijneman, Remond J.A. ; Boiten, Jan-Willem ; Meijer, Gerrit A. ; Stubbs, Andrew ; Rambla, Jordi ; Spalding, Dylan ; Abeln, Sanne. / Integration of EGA secure data access into Galaxy. In: F1000Research. 2016 ; Vol. 5.
@article{5f7ff65e785e4181918ed7df158542a1,
title = "Integration of EGA secure data access into Galaxy",
abstract = "High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.",
keywords = "Bioinformatics, Data management, EGA, Galaxy, Translational research, Workflows",
author = "Youri Hoogstrate and Chao Zhang and Alexander Senf and J. Bijlard and Saskia Hiltemann and {van Enckevort}, David and Susanna Repo and J. Heringa and Guido Jenster and Fijneman, {Remond J.A.} and Jan-Willem Boiten and Meijer, {Gerrit A.} and Andrew Stubbs and Jordi Rambla and Dylan Spalding and Sanne Abeln",
year = "2016",
doi = "10.12688/f1000research.10221.1",
language = "English",
volume = "5",
journal = "F1000Research",
issn = "2046-1402",
publisher = "F1000 Research Ltd.",

}

Hoogstrate, Y, Zhang, C, Senf, A, Bijlard, J, Hiltemann, S, van Enckevort, D, Repo, S, Heringa, J, Jenster, G, Fijneman, RJA, Boiten, J-W, Meijer, GA, Stubbs, A, Rambla, J, Spalding, D & Abeln, S 2016, 'Integration of EGA secure data access into Galaxy' F1000Research, vol. 5, 2841. https://doi.org/10.12688/f1000research.10221.1

Integration of EGA secure data access into Galaxy. / Hoogstrate, Youri; Zhang, Chao; Senf, Alexander; Bijlard, J.; Hiltemann, Saskia; van Enckevort, David; Repo, Susanna; Heringa, J.; Jenster, Guido; Fijneman, Remond J.A.; Boiten, Jan-Willem; Meijer, Gerrit A.; Stubbs, Andrew; Rambla, Jordi; Spalding, Dylan; Abeln, Sanne.

In: F1000Research, Vol. 5, 2841, 2016.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Integration of EGA secure data access into Galaxy

AU - Hoogstrate, Youri

AU - Zhang, Chao

AU - Senf, Alexander

AU - Bijlard, J.

AU - Hiltemann, Saskia

AU - van Enckevort, David

AU - Repo, Susanna

AU - Heringa, J.

AU - Jenster, Guido

AU - Fijneman, Remond J.A.

AU - Boiten, Jan-Willem

AU - Meijer, Gerrit A.

AU - Stubbs, Andrew

AU - Rambla, Jordi

AU - Spalding, Dylan

AU - Abeln, Sanne

PY - 2016

Y1 - 2016

N2 - High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.

AB - High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.

KW - Bioinformatics

KW - Data management

KW - EGA

KW - Galaxy

KW - Translational research

KW - Workflows

UR - http://www.scopus.com/inward/record.url?scp=85013820978&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013820978&partnerID=8YFLogxK

U2 - 10.12688/f1000research.10221.1

DO - 10.12688/f1000research.10221.1

M3 - Article

VL - 5

JO - F1000Research

JF - F1000Research

SN - 2046-1402

M1 - 2841

ER -

Hoogstrate Y, Zhang C, Senf A, Bijlard J, Hiltemann S, van Enckevort D et al. Integration of EGA secure data access into Galaxy. F1000Research. 2016;5. 2841. https://doi.org/10.12688/f1000research.10221.1