You are on page 1of 13

VarBase: A Platform for the Storage and

Clinical Interpretation of Next


Generation Sequencing Data
Wade L. Schulz, MD, PhD, John G. Howe, PhD, Karl Hager, PhD, Henry M. Rinder, MD
Yale University, Department of Laboratory Medicine

SLIDE 0

The Problem
Next generation sequencing panel brought online for leukemia and
myelodysplastic syndrome, needed interpretive and data
management software
Interpretation efficiency
Patient safety
Turnaround time

SLIDE 1

Project Goals

Integrate annotation information from disparate data silos


Provide interpretation interface
Generate digital and printed reports
Provide robust research tools for ongoing clinical and laboratory
studies

SLIDE 2

The VarBase Platform


Cloud (Kepler)
VarBase - Kepler

Local (Galileo)
Web Interface

Galileo Services

IonReporter
Research
Applications

Internal
Clients

SLIDE 3

Relations vs. Documents: SQL vs NoSQL

SQL/Relational
ACID Compliant
Column-level encryption
Well-known deployment

NoSQL/Document
Dynamic schema
Improved read/write speeds
Easy scaling and redundancy

SLIDE 4

Tool Selection: Public Data Repository (Kepler)


Elasticsearch
Web service wrapper
Administrative import interface

VarBase - Kepler

COSMIC
ClinVar
OMIM
dbSNP

Index updated every 3 months

Web Interface

Galileo Services

IonReporter
Research
Applications

Internal
Clients

SLIDE 5

Tool Selection: Private Data Warehouse (Galileo)


MSSQL + Elasticsearch
Patient data encrypted in SQL
Panel information stored in SQL
Non-demographic information
in SQL+Elasticsearch
Variants in Elasticsearch+Disk

Document database caveat: Most


are not ACID compliant, should
not be used as a primary data
store

SLIDE 6

Tool Selection: Authentication/Authorization


Institutional Active Directory
Role-based authorization
Web application restrictions

VarBase - Kepler

Laboratory personnel
Trainee (Resident/Fellow)
Attending

Web interface restrictions


Patient-based restriction
Data-type restriction

Web Interface

Galileo Services

IonReporter
Research
Applications

Internal
Clients

SLIDE 7

Variants in JSON
{
"chromosome": "chr7",
"position": 148506396,
"type": "snv",
"refAllele": "A",
"altAllele": "C",
"totalReads": 1998,
"forwardReads": 1038,
"forwardRefReads": 524,
"forwardAltReads": 514,
"reverseReads": 960,
"reverseRefReads": 500,
"reverseAltReads": 460,
"refReads": 1024,
"altReads": 974,
"vaf": 48.749,
"variantRegion": "intronic",
"variantEffect": "",
"snvEffect": "A>C",
"gene": "EZH2"

Variant location in genome


Nucleotide change
Sequencing statistics
Variant allele frequency
Variant coding/protein effects

SLIDE 8

Research Integration: Data Visualization (Kibana)

SLIDE 9

System Statistics
Cloud-Based Elasticsearch Cluster
60 million variant annotations
10 million oncology annotations

Local Elasticsearch+MSSQL Instance


>80 specimens + validation specimens

Turnaround time: 1-2 weeks

S L I D E 10

Conclusions
Hybrid data store can efficiently and securely store complex data
types
Cloud-based variant annotation can be integrated into multiple
services and provides auditable interpretation information
Technology agnostic web service interfaces provide easily accessible
data interchange

S L I D E 11

Acknowledgements

Henry Rinder, MD
Alexa Siddon, MD
Richard Torres, MD
Christopher Tormey, MD
Thomas Durant, MD

Molecular Pathology Laboratory


John G. Howe, PhD
Karl Hager, PhD

Laboratory Informatics
Rodion Rathbone, MD
Nathan Price

S L I D E 12

You might also like