1. Introduction
The SVGenotyper walker traverses a VCF file to compute genotypes for
structural variations. This walker is the main component of the
SVGenotyper pipeline.
Currently, only genotyping of deletions relative to the reference is
implemented.
2. Inputs / Arguments
-I <bam-file>
: The set of input BAM files.-runDirectory <directory>
: The directory where auxilliary output files
will be written (default is the current directory).-md <directory>
: The metadata directory containing metadata about the
input data set. See SVPreprocess.-R <fasta-file>
: Reference sequence. An indexed fasta file containing
the reference sequence that the input BAM files were aligned against. The
fasta file must be indexed with 'samtools faidx' or the equivalent.-genomeMaskFile <mask-file>
: Mask file that describes the alignability of
the reference sequence. See Genome Mask Files.-configFile <configuration-file>
: This file contains settings for
specialized settings that do not normally need to be changed. A default
configuration file is provided in conf/genstrip_parameters.txt.-sample <sample-ID>
: The sample to gentoype (or list of samples if
multiple arguments are supplied). By default, genotypes are computed for all
samples present in the input BAM files.-sampleList <file>
: A file containing the list of samples to genotype
(one sample ID per line).-altAlleleAlignments <bam-file>
: A BAM file containing alignments to the
alternate alleles of events present in the input VCF file. These alternate
alignments should be computed by the SVAltAlign pipeline.-partitionName <string>
: This specifies the name of the partition being
computed during parallel runs. The output files will be prefixed with the
name of the partition.-partition <partition-spec>
: Describes the subset of the VCF file to
process. : The format is "records:N-M" where ''N'' and ''M'' are the 1-based
indexes of a range of records from the input VCF file that will be processed.
3. Outputs
-O <vcf-file>
: The main output is a VCF file containing genotypes for
structural variation sites from the input VCF file.
Depending on settings in the configuration file, this walker will also produce
a number of auxilliary output files. These files are mostly useful for
debugging. The content and format of these files is subject to change.
4. Running
Currently, this walker needs to be invoked through a special wrapper around
the GATK command line interface. This wrapper accepts all of the standard GATK
command line options. An example is shown below.
The input VCF file should be passed as a GATK ROD (reference ordered datum)
file. This walker also requires the -BTI argument to be passed to the GATK
engine.
java -Xmx4g -cp SVToolkit.jar:GenomeAnalysisTK.jar \
org.broadinstitute.sv.main.SVGenotyper \
-T SVGenotyper \
-configFile conf/genstrip_parameters.txt \
-md metadata \
-R Homo_sapiens_assembly18.fasta \
-genomeMaskFile Homo_sapiens_assembly18.mask.36.fasta \
-altAlignments alt_allele_alignments.bam \
-B:input,VCF input.sites.vcf \
-BTI \
-I input1.bam -I input2.bam \
-O output.genotypes.vcf \
-runDirectory run1
5. Dependencies
The SV Genotyping code uses some R scripts. R needs to be installed and the
Rscript executable needs to be on your path to run this walker.