Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

SVGenotyper walker

$
0
0

1. Introduction

The SVGenotyper walker traverses a VCF file to compute genotypes for
structural variations. This walker is the main component of the
SVGenotyper pipeline.

Currently, only genotyping of deletions relative to the reference is
implemented.

2. Inputs / Arguments

  • -I <bam-file> : The set of input BAM files.

  • -runDirectory <directory> : The directory where auxilliary output files
    will be written (default is the current directory).

  • -md <directory> : The metadata directory containing metadata about the
    input data set. See SVPreprocess.

  • -R <fasta-file> : Reference sequence. An indexed fasta file containing
    the reference sequence that the input BAM files were aligned against. The
    fasta file must be indexed with 'samtools faidx' or the equivalent.

  • -genomeMaskFile <mask-file> : Mask file that describes the alignability of
    the reference sequence. See Genome Mask Files.

  • -configFile <configuration-file> : This file contains settings for
    specialized settings that do not normally need to be changed. A default
    configuration file is provided in conf/genstrip_parameters.txt.

  • -sample <sample-ID> : The sample to gentoype (or list of samples if
    multiple arguments are supplied). By default, genotypes are computed for all
    samples present in the input BAM files.

  • -sampleList <file> : A file containing the list of samples to genotype
    (one sample ID per line).

  • -altAlleleAlignments <bam-file> : A BAM file containing alignments to the
    alternate alleles of events present in the input VCF file. These alternate
    alignments should be computed by the SVAltAlign pipeline.

  • -partitionName <string> : This specifies the name of the partition being
    computed during parallel runs. The output files will be prefixed with the
    name of the partition.

  • -partition <partition-spec> : Describes the subset of the VCF file to
    process. : The format is "records:N-M" where ''N'' and ''M'' are the 1-based
    indexes of a range of records from the input VCF file that will be processed.

3. Outputs

  • -O <vcf-file> : The main output is a VCF file containing genotypes for
    structural variation sites from the input VCF file.

Depending on settings in the configuration file, this walker will also produce
a number of auxilliary output files. These files are mostly useful for
debugging. The content and format of these files is subject to change.

4. Running

Currently, this walker needs to be invoked through a special wrapper around
the GATK command line interface. This wrapper accepts all of the standard GATK
command line options. An example is shown below.

The input VCF file should be passed as a GATK ROD (reference ordered datum)
file. This walker also requires the -BTI argument to be passed to the GATK
engine.

java -Xmx4g -cp SVToolkit.jar:GenomeAnalysisTK.jar \
    org.broadinstitute.sv.main.SVGenotyper \
    -T SVGenotyper \
    -configFile conf/genstrip_parameters.txt \
    -md metadata \
    -R Homo_sapiens_assembly18.fasta \
    -genomeMaskFile Homo_sapiens_assembly18.mask.36.fasta \
    -altAlignments alt_allele_alignments.bam \
    -B:input,VCF input.sites.vcf \
    -BTI \
    -I input1.bam -I input2.bam \
    -O output.genotypes.vcf \
    -runDirectory run1

5. Dependencies

The SV Genotyping code uses some R scripts. R needs to be installed and the
Rscript executable needs to be on your path to run this walker.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>