Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

GenerateAltAlleleFasta

$
0
0

1. Introduction

The GenerateAltAlleleFasta utility processes a VCF file to extract the
sequences of the alternate alleles.

For each structural variation record in the VCF, this utility will generate
one output sequence in fasta format for each alternative allele that has
precise breakpoints. The identifier for the alternate allele will be
variantID_alleleNumber where alleleNumber is the number of the allele
in the ALT column of the VCF file (the first ALT allele is allele 1).

The remainder of each fasta header line after the ID contains an encoded
description of how the allele sequence maps back to the reference genome. The
naming convention for the fasta sequences and the format of the rest of the
header line is understood by other programs that use the alternate allele
fasta file as input.

Here is an example of a generated fasta header:

>P2_M_061510_20_81_1 L:chr20:51913435-51913634;1-200|R:chr20:51913736-51913935;202-401|LENGTH:401

This example us for the first alternate allele of a variant with ID P2_M_061510_20_81. The length of the generated fasta sequence is 401 bases.
Bases 1-200 of the alternate allele sequence aligns to chr20:51913435-51913634
of the reference sequence and bases 202-401 of the fasta sequence aligns to
bases chr20:51913736-51913935 of the reference sequence. Thus, this event
represents a deletion of 101bp of the reference (chr20:51913635-51913735) with
one base of non-template sequence present in the alternate allele.

2. Inputs / Arguments

  • -I <vcf-file> : The input VCF file.

  • -R <fasta-file> : Reference sequence. An indexed fasta file containing
    the reference sequence. The fasta file must be indexed with 'samtools faidx'
    or the equivalent.

  • -flankLength <N> : The number of reference bases to include around each
    alternate allele (default 200). The flank length is counted outside of any
    micro-homology around the breakpoints.

3. Outputs

  • -O <fasta-file> : An output fasta file containing one entry for each
    alternative structural allele. The default is to write to stdout.

Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>