Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Description and examples of the steps in the ACNV case workflow

$
0
0

Once you have run GATK CNV, you can run ACNV for revised segments based on both the target-coverage profile and the ref/alt counts at heterozygous SNPs. ACNV will report estimates for the posterior probabilities for copy ratio and minor-allele fraction in each segment.

The ACNV case workflow (description and examples)

Requirements

  1. Java 1.8
  2. A functioning GATK4-protected jar (hellbender-protected.jar or gatk-protected.jar)
  3. Reference genome (fasta files) with fai and dict files. This can be downloaded as part of the GATK resource bundle: http://www.broadinstitute.org/gatk/guide/article?id=1213
  4. Samples must be paired. You will need both a case sample (typically, a tumor) and a control sample (typically, a blood normal). We are working on alleviating this requirement.
  5. A list of common heterozygous SNP sites. Currently, this needs to be in the Picard interval-list format. See http://gatkforums.broadinstitute.org/gatk/discussion/7812/creating-a-list-of-common-snps-for-use-with-getbayesianhetcoverage
  6. A completed run of GATK CNV for the case sample.
Overview of steps
  1. Identify heterozygous SNPs in the normal and aggregate read counts at these sites in the tumor.
  2. Segment the case sample (based on both the read counts from step 1 and input from GATK CNV) and estimate copy ratio and minor-allele fraction in each segment.
  3. Call copy-neutral loss-of-heterozygosity and balanced segments. This step will also create files that can be used as input for ABSOLUTE (Broad-internal versions only) and TITAN.
Step 1. Het Pulldown

** These instructions describe one method for Het Pulldown for matched samples. For more options, including tumor-only, please see: http://gatkforums.broadinstitute.org/gatk/discussion/7719/overview-of-getbayesianhetcoverage-for-heterozygous-snp-calling **

Inputs
  • control_bam -- BAM file for control sample (normal).

  • case_bam -- BAM file for case sample (tumor).

  • reference_sequence -- FASTA file for b37 reference.
  • snp_file -- Picard interval list of common SNP sites at which to test for heterozygosity in the control sample .
Outputs
  • normal_het_pulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc., where M is the number of hets called in the control sample.

  • tumor_het_pulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc. for sites in the case sample that were called as het in the control sample, where M is the number of hets called in the control sample.

Format for both output files:

CONTIG  POSITION        REF_COUNT       ALT_COUNT       REF_NUCLEOTIDE  ALT_NUCLEOTIDE  READ_DEPTH
1       809876  5       16      A       G       21
1       881627  23      12      G       A       35
1       882033  9       10      G       A       19
1       900505  26      24      G       C       50
....snip....
Invocation
java -jar <path_to_gatk_protected_jar> GetBayesianHetCoverage --reference <reference_sequence>
    --snpIntervals <snp_file> --tumor <case_bam> --tumorHets <tumor_het_pulldown> --normal <control_bam>
    --normalHets <normal_het_pulldown> --hetCallingStringency 30
Step 2. Allelic CNV
Inputs
  • tumor_het_pulldown -- Generated in step 1.

  • coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow.

  • called_segments -- Called-segments TSV file obtained in the GATK CNV case workflow.
  • output_prefix -- Path and file prefix for creating the output files. For example, /home/lichtens/my_acnv_output/sample1
Outputs
  • acnv_segments -- TSV file with name ending with -sim-final.seg containing posterior summary statistics for log_2 copy ratio and minor-allele fraction in each segment. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.seg

  • acnv_cr_parameters -- TSV file with name ending with -sim-final.cr.param containing posterior summary statistics for global parameters of the copy-ratio model. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.cr.param

  • acnv_af_parameters -- TSV file with name ending with -sim-final.af.param containing posterior summary statistics for global parameters of the allele-fraction model. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.af.param

Other files containing intermediate results of the calculation are also generated.

Invocation
 java -Xmx8g -jar <path_to_gatk_protected_jar> AllelicCNV  --tumorHets <tumor_het_pulldown>
    --tangentNormalized <coverage_profile> --segments <called_segments> --outputPrefix <output_prefix>
Step 3. Call CNLoH and Balanced Segments

** WARNING: This tool is experimental and exists primarily for internal Broad use. **

Inputs
  • tumor_het_pulldown -- Generated in step 1.

  • acnv_segments -- Generated in step 2 (*-sim-final.seg).

  • coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow
  • output_dir -- Directory for creating the output files. For example, /home/lichtens/my_acnv_cnlohcalls_output/
Outputs
  • GATK-CNV-formatted seg file -- TSV file ending with -sim-final.cnv.seg. This file is formatted identically as the output of GATK CNV. Note that this implies that the allelic fraction values are not captured in this file.

  • AllelicCapSeg-formatted seg file -- TSV file ending with -sim-final.acs.seg. This file is formatted identically as the output of Broad CGA AllelicCapSeg. Note that this file can be used as input to Broad-internal versions of ABSOLUTE.

  • TITAN-compatible het file --TSV file ending with -sim-final.titan.het.tsv. This file can be used as the input to TITAN for the het read counts.
  • TITAN-compatible copy-ratio file -- TSV file ending with -sim-final.titan.tn.tsv. This file can be used as the input to TITAN for the per-target copy-ratio estimates.
Invocation
 java -Xmx8g -jar <path_to_gatk_protected_jar> CallCNLoHAndSplits  --tumorHets <tumor_het_pulldown>
    --segments <acnv_segments> --tangentNormalized <coverage_profile> --outputDir <output_dir>
    --rhoThreshold 0.2 --numIterations 10  --sparkMaster local[*]  

Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>