GWAS Pipeline

Accelerate genomic analysis, target discovery, and drug discovery by efficiently interrogating WES datasets at biobank scale with a proven Genome-wide Association Studies (GWAS) pipeline platform

Context

Longer time & effort to conduct GWAS analysis including scalability and visualization issues

Example - Manhattan plot: The variants (SNPs) with the strongest associations have the greatest negative logarithms, and tower over the background of unassociated SNPs
Example - Manhattan plot: The variants (SNPs) with the strongest associations have the greatest negative logarithms, and tower over the background of unassociated SNPs.

Genome-wide association studies (GWAS) are both a science and an art – no two studies are the same and cohorts need to be carefully designed. However, streamlining GWAS with a platform-based approach can improve efficiencies and avoid recurring challenges. These include:

  • Standing up the infra and tech stacks needed: IT teams still take months to get the setup right even in this age of cloud computing as the expertise required cuts across many disciplines
  • Scaling tech stack to handle ever-expanding datasets: UK Biobank WES dataset currently provides 300k sequences and is growing further
  • Consistently applying best-practice checks for Sample and Variant QC, and screening for relatedness and other sampling biases
  • Joining with a number of reference databases such as GNOMAD for allele frequencies and ClinVar for known consequences and impact
  • Interactive visualization of sequence data, phenotype data and study results: This is a challenge given the size of sequence data and variety of phenotypic data
Our Solution

AI powered drug discovery platform for genomic analysis

Aganitha provides an in silico platform for GWAS, that is:

  • Available on-demand, powered by Infrastructure as Code approach, supporting both in-house HPC as well as all Cloud based clusters
  • Cost-effective, without dependence on expensive proprietary big data stacks and services
  • Vertically integrated to provide industry leading performance
  • Comprehensive, spanning all activities from data ingestion to QC, cohort selection, regression and visualization
  • Interactive, enabling scientists to inspect and apply Sample and Variant QCs, and carefully design study cohorts
  • Pre-integrated with commonly needed reference datasets, annotation and visualization tools
  • Complemented by a complete portfolio of service offerings from Aganitha which seamlessly integrate all the genomics and technology expertise needed
Given a population consisting of individuals with and without disease of interest, the steps involved are whole exome/genome sequencing, within Hail - variant analysis, quality control, LD pruning and statistical analysis followed by downstream analysis
Genome-Wide Association Pipeline
Highlights

Key components & strengths

Kubernetes

Easily deployable on any cloud that supports Kubernetes, including AWS/GCP/Azure

HPC deployment with SLURM/SGE/equivalent scheduler

Deployable in internal HPC clusters using any of the leading schedulers such as SLURM, SGE
Hail on Spark

Hail on Spark

Leverages leading GWAS library, Hail (from Broad Institute), which in turn, leverages distributed data processing capabilities of Apache Spark

Jupyter notebooks

Supports interactive use by scientists, via Jupyter notebooks

Pre-Integrated

Comes pre-integrated with leading open source libraries and tools such as VEP

Services

Complemented by a complete portfolio of service offerings which seamlessly integrate all the genomics and technology expertise needed
Outcomes

Accelerated Drug Discovery with efficient & faster Genomic Analysis, GWAS, and Target Discovery

Careful cohort selections

Understand mendelian violations in trios; prune variants in linkage disequilibrium; analyze genetic similarity between samples and compute sample scores and variant loadings using PCA

Rich association studies

Perform variant, gene-burden and eQTL association analyses using linear, logistic, and linear mixed regression; estimate heritability

Instant ignition

Rapid setup of pipeline in days saving weeks of time

Scalable

Elastically scale your deployment as your WES datasets grow in size

Discover our offerings across the biopharma value chain

Learn more about our GWAS Pipeline