Is it necessary to process 1000 genome data for exome variant calling training?
I have an independently sequenced human exomes with 100x coverage. I would like to call variants using the GATK best practices guidelines, and have been following the guide to do so. However, I am...
View ArticleGATK-HC results generate records with read depth of zero?
Dear all, I ran GATK3.6 and noticed that few records from the Haplotype Caller happen to contain read depth as zero in the FORMAT field. Below is one instance: chr2 208630795 . T TA,TAA 0.39 ....
View ArticleGATK runtime error (READ_MAX_LENGTH must be > 0 but got 0) with 1000g bam
Hi, I'm trying to build a pon with GATK 3.7-0 to use with mutect2. For that, I've downloaded 80 exome bam files from the 1000g project (GBR, TSI, IBS and CEU populations). For most of them, when I try...
View ArticleMuTect2: I need some clarification on the output
Hello everyone, I have recently produced results from MuTect2. This is my command running on default parameters: java -jar WES/programs/GATK-3.7.jar -T MuTect2 -I:normal...
View ArticleBimodal MQ distribution
Hi GATK-team, I am working with a dataset of WG human samples (Illumina paired-end, ~30X) comprising about 120 samples, all from Africa (from ~25 different populations). For 2/3 of the samples the data...
View ArticleA dog, a ship and an algorithm that measures relatedness
What is beagle. Beagle is a type of dog known for its even temper and intelligence. It is also the name given to the ship Darwin sailed to the Galapagos (the H.M.S. Beagle), where he developed his...
View Article(How to) Mark duplicates with MarkDuplicates or MarkDuplicatesWithMateCigar
This tutorial updates Tutorial#2799. Here we discuss two tools, MarkDuplicates and MarkDuplicatesWithMateCigar, that flag duplicates. We provide example data and example commands for you to follow...
View ArticleNullPointerException in PhaseByTransmission
Hi all, Has anyone else gotten the following: java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:242) at...
View Articleabout pairHMM transition probabilities calculate
i have a doubt about the pairHMM transition probabilities calculating in common_data_structure.h (167 line), as I known , the quality value is less than 128, MAX_QUAL is 254 , so why your codes judge...
View ArticleHC step 3 : Evaluating the evidence for haplotypes and variant alleles
This document describes the procedure used by HaplotypeCaller to evaluate the evidence for variant alleles based on candidate haplotypes determined in the previous step for a given ActiveRegion. For...
View Articlehow necessary is to do VariantFiltration after VQSR?
Hi everyone I have ~200 whole genomes sequenced at 30x and am currently having an issue after HaplotypeCaller, which in chromosome Y most of the GT of the samples are not supported by many reads...
View ArticleThree allele calls for a diploid organism
Hello Everyone, Not sure what to make of this one-- I have a diploid fish that I'm examining an interesting variant for. I generated a "bamout" file using Haplotype caller for one particular individual...
View ArticleSmall GATK examples
Besides the Best Practices, which we are talking about a lot of data to be parsed, is there a "Hello World" example which is using small data (like half a chromosome) to generate results, and basically...
View ArticleOptimizing Mutect2 runs on whole genomes?
Dear GATK, Given the most current optimized way to run Mutect2 on whole genomes of about 40-60X coverage (~300 G) , how long can I expect it to run on one whole genome sequence? Particularly, what...
View ArticleFastqToSam error
Hi I am trying to generate bam files from reads (originally in fasta format) with ‘artifical Q-score’ generated with the attached script ‘fasta_to_fastq’ First step : I created fastq file from source...
View ArticleGATK 3.7 HaplotypeCaller NullPointerException in...
HaplotypeCaller in GATK 3.7 (3.7-0-g56f2c1a) is throwing a NullPointerException in some cases. See below for log output from a failing run. It looks to me like the call to .get() in the...
View Article(How to) Call somatic copy number variants using GATK4 CNV
Presented tools are in BETA. Document is in BETA. It may be incomplete and/or inaccurate. Post suggestions to the Comments section and be sure to read about updates also within the Comments section....
View Articlebcl2fastq and small test data
I am preparing a Docker image for running bcl2fastq 2.19. I would like to create an integration test using this Docker container with some small BCL data. I've tried...
View ArticleGATK DepthOfCoverage
We have a problem with GATK DepthOfCoverage, it is running too slow. With default parameters, it took 12 hours on Amazon c4.2xlarge instance for a BAM of 125MB. We are running at this moment another...
View ArticleVQSR Resources for Indels
Hi, I noticed that on the page for setting the right arguments for VQSR you mentioned that Mills and dbSNP should be used as resources for INDEL variant recalibration. However, at the bottom of the...
View Article