VariantRecalibrator SNP and INDEL failure rate
Hi I've been struggling with some issues we have been having with the VariantRecalibrator. Here's the story. We run VariantRecalibrator on our new sample in combination with the previous ones (gVCF...
View ArticleGATK selectVariants on vcf
I'm using GATK (v3.3.) SelectVariants on the .vcf file of the ExAc data (downloaded from ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/). I get the following error java -Xmx45g...
View ArticleCan FastaAlternateReferenceMaker put N instead the reference base when data...
Hi, I did SNP calling on a dataset of 200 whole bacterial genomes. I then used the FastaAlternateReferenceMaker to receive my consensus sequences and do phylogenetic analyses with them. When I looked...
View Articlemissing FOXOG annotation for specific SNP variants in VCFs
Hi, We've run Mutect2 with OxoGReadCounts walker to obtain FOXOG annotation for somatic variants in VFCs. I know FOXOG annotation is only calculated for SNPs and when denominator (ALT_F1R2 + ALT_F2R1)...
View Articlemultiple vcf input to SelectVariants
I am trying to use the GATK (3.6) SelectVariants tool, and I want to input several vcf files. Preferably all as --variant (-V). The key here is that these have to be separate files, not one multi...
View ArticleWhat is a VCF and how should I interpret it?
This document describes "regular" VCF files produced for GERMLINE calls. For information on the special kind of VCF called gVCF, produced by HaplotypeCaller in -ERC GVCF mode, please see this companion...
View ArticleDP4 flag as output option in UnifiedGenotyper or Haplotypecaller
Hi GATK team, the default out of UnifiedGenotyper organizes the format column as GT:AD:DP:GQ:PL and the HaplotypeCaller as: GT:AD:GQ:PL. Is it also possible get in addition the DP4 flag, that lists the...
View ArticleHow to combine gVCFs form different chromosomes
Hi I have gVCFS genberated by haplotypecaller (gatk 3.5). 3 sets, 2 of them have same chromosome different samples, 200 each, and one has different chromosomes different samples. What would be the best...
View ArticleRunning GenomeStrip from installtest
Hello, I'm trying to run GenomeStrip using the installtest folder files and I'm getting the following error: "java.lang.UnsupportedClassVersionError: org/broadinstitute/sv/apps/SVToolkitInfo :...
View ArticleGenotypeGVCFs Hangs at the start
Hi team, thanks for your work. I'm currently running GenotypeGVCF on 78 WG individuals using the -L option to run each chromosome separately. java -jar -Xmx20g -XX:ParallelGCThreads=10...
View ArticleVQSR failed with "No data found" with whole genome variant calls, but...
Hi, I ran VQSR on the SNP calls from a WGS sample mapped to b37+decoy reference, it failed at the training step with error message "No data found". I then removed the SNP calls on the small contigs...
View ArticleWhat is Map/Reduce and why are GATK tools called "walkers"?
Overview One of the key challenges of working with next-gen sequence data is that input files are usually very large. We can’t just make the program open the files, load all the data into memory and...
View Article(howto) Recalibrate variant quality scores = run VQSR
Objective Recalibrate variant quality scores and produce a callset filtered for the desired levels of sensitivity and specificity. Prerequisites TBD Caveats This document provides a typical usage...
View ArticleRunning Haplotypecaller with benchmark dataset
I tried to call variants with WGS benchmark dataset using ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/working/20101201_cg_NA12878/NA12878.hiseq.wgs.bwa.recal.bam . First, I run the...
View ArticleEmpty INFO entries for "RankSums"
After running the GATK pipeline, about one third of my variants have no entries for BaseQRankSum, ClippingRankSum, or MQRankSum, which also causes FS=0. This makes hard filtering challenging, because I...
View ArticleWhat is the best way to make an in-house database with allele frequency (like...
Hi, I am currently working on making a pilot in-house database of around 35 exomes data we have in our laboratory. I am following the GATK best practices guidelines for data analysis. The following...
View ArticleDepthOfCoverage multiple samples
Dear all, I have about 300 bam files (whole-genome sequence) and I'm trying to get the DepthOfCoverage per chomosome. I'm using a script to submit a job for each chromosome, like that: java -jar...
View ArticleDoes MuTect2 actually support GVCF output?
MuTect2 docs since 3.5-0 listed the possibility of GVCF output by using -ERC GVCF. However, it seems to me that for both 3.5-0 or 3.6-0 versions, adding that flag would still give out a standard VCF...
View ArticleSplitNCigarReads Unsupported Cigar Operator: =
Hi, I got this error when trying to run SplitNCigarReads on a BAM file. I checked the source that generates this exception, and it appears that (as may be expected from the name), there is no handling...
View ArticleSingle-Cell population join calling
Hello folks, I have a question regarding the pipeline I should use to do some QC on my different samples of single-cell sequencing I have. Here's mainly the design of our experiment. We have different...
View Article