Tutorial

This document is an end-to-end tutorial for the impatient users who want to quickly setup and prioritize structural variants with SvAnna.

Prerequisites

SvAnna is written in Java 11 and needs Java 11+ to be present in the runtime environment. Please verify that you are using Java 11+ by running:

$ java -version

If java is present on your $PATH, then the command above will print a message similar to this one:

openjdk version "11" 2018-09-25
OpenJDK Runtime Environment 18.9 (build 11+28)
OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)

Setup

SvAnna is installed by running the following three steps.

1. Download SvAnna distribution ZIP

Download and extract SvAnna distribution ZIP archive from GitHub releases. Expand the Assets menu and download the svanna-cli-${project.version}-distribution.zip. Choose the latest stable version, or a release candidate (RC).

After unzipping the distribution archive, run the following command to display the help message:

$ java -jar svanna-cli-${project.version}.jar --help

Note

If things went well, the command above will print the following help message:

Structural variant prioritization
Usage: svanna-cli.jar [-hV] [COMMAND]
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.
Commands:
  setup-phenotype  Setup gene-phenotype resources.
  prioritize       Prioritize the variants.
See the full documentation at `https://monarch-initiative.github.io/SvAnna/stable`

2. Download SvAnna database files

SvAnna database files are available for download in the Downloads section.

After the download, unzip the archive(s) content into a folder of your choice and note down the path:

$ unzip -d svanna-data *.svanna.zip

The command extracts the archive content into a new folder called svanna-data We will need the data folder path in the next steps.

3. Setup the genotype-phenotype resources

SvAnna needs additional data files from the Human Phenotype Ontology (HPO) project to support the gene-phenotype matching. These files can be downloaded with the setup-phenotype command:

$ java -jar svanna-cli.jar setup-phenotype -d svanna-data

The command asks for a path to SvAnna data directory (defined in the previous step) and will download the files, precompute information content for HPO term pairs, and store the files into phenotype subfolder (e.g. svanna-data/phenotype).

Prioritize structural variants in VCF file

Now, let’s annotate a toy VCF file containing eight SVs reported in the SvAnna manuscript. First, let’s download the VCF file from SvAnna source code repository:

$ wget https://raw.githubusercontent.com/monarch-initiative/SvAnna/master/svanna-cli/src/examples/example.vcf

The variants were sourced from published clinical case reports and presence of each variant results in a Mendelian disease.

For the purpose of this test run, let’s assume that the VCF file contains SVs identified in a short/long read sequencing run of a patient presenting with the following clinical symptoms:

HP:0011890 - Prolonged bleeding following procedure
HP:0000978 - Bruising susceptibility
HP:0012147 - Reduced quantity of Von Willebrand factor

Now, let’s prioritize the variants:

$ java -jar svanna-cli-${project.version}.jar prioritize \
  -d svanna-data \
  --output-format html,csv,vcf \
  --vcf example.vcf \
  --phenotype-term HP:0011890 \
  --phenotype-term HP:0000978 \
  --phenotype-term HP:0012147

The variant Othman-2010-20696945-VWF-index-FigS7 disrupts a promoter of the von Willenbrand factor (VWF) gene (Othman et al., 2010). The variant receives the highest $PSV$ score of 47.26, and it is ranked first.

SvAnna stores prioritization results in HTML, CSV, and VCF output formats in the current working directory.