We are seeking a genomic data architect with interdisciplinary experience, including a track record of supporting innovative, high quality research by managing and modelling large volumes of genetic and genomic data and results in a distributed database and analytical environment.
GSK aims to improve the number of successful late stage clinical trials for innovative medicines, by both identifying and advancing drug targets that have strong evidence of a causal role in disease biology. The Human Genetics team leverages major scientific and technological advances, including investment in biobanks linked to large-scale human health databases, cutting-edge informatics platforms, breakthrough understanding of biological pathways, functional genomics capabilities built upon rapid progress in gene editing, and leading industry-academia partnerships, in order to identify the best targets and to continue evaluation of targets through their life in the pipeline.
The successful candidate will work in a multidisciplinary, collaborative and scientifically driven environment, interacting with GSK scientists and key academic collaborators to advance drug discovery and clinical development in multiple disease areas. This research will leverage industry-leading data and compute resources, to address important drug discovery and development challenges, to directly impact GSK’s R&D pipeline, and to publish in top scientific journals.
The selected Genomic Data Architect will:
Develop and maintain a robust data platform to efficiently house and represent critical human genetic, genomic, and clinical/phenotypic data, to inform target selection and validation decisions across a range of disease areas.
Leverage the opportunities and efficiencies afforded by access to hybrid cloud-based, distributed ecosystem of database technologies
Participate closely with GSK genetics analysts and data scientists, recommending data structures to enable novel genetic analyses, leveraging large volumes of data on modern storage and computational platforms.
Be responsible for monitoring data integrity and security and addressing database issues that arise
Multiple Locations: Collegeville, Pennsylvania; Stevenage, United Kingdom
We are looking for professionals with these required skills to achieve our goals:
Masters or equivalent experience in genetics, bioinformatics, or related life sciences applications.
Experience with genetic and genomic data types, including public genetic databases and results data from high-throughput genetic assays (e.g. UK Biobank, Gnomad, etc.)
5+ years of experience in complex data architecture design, and familiarity with applications of FAIR principles
Hands-on maintenance of database systems and data manipulation using SQL, working within a POSIX CLI environment
Experience with distributed database technologies (e.g. the Apache Hadoop/Hive ecosystem).
Experience with management of large volumes of a variety of genetic/genomic data and results
If you have the following characteristics, it would be a plus:
Ability to discuss genomic data types and analyses used in early target discovery.
Experience with data and/or results specifically from large scale genetic association studies (e.g. GWAS or PheWAS) or large scale functional genomic data.
Experience with complex longitudinal human clinical/phenotype data, e.g. from electronic health records, epidemiological cohorts, or clinical trials.