MarkDuplicates: avoid excessive duplicate set size?
Hi, I am facing an issue that I do not understand using MarkDuplicates . I have 2 bam files produced from mRNA .fastq files with the same protocol (GATK Best Practices): file A is 20GB (HiSeq 2500);...
View ArticleWhere do I find population allele frequency files for ContEst?
Hello. I was looking over some of the previous Forum discussion regarding the population allele frequency .vcf files used in the ContEst tool and I am curious if the previous problems with formatting...
View ArticleDownsampling Experiment
Hello! Trying to downsample in an orderly fashion in the name of experimentation, and in doing so would like to specify just one chromosome for the experiment - so I picked chromosome 17 with -L and a...
View ArticleRe:Automating SelectVariants in a script using bash parameter sustitution
I have found SelectVariants really useful to cull out homozygous calls from my vcf file, especially using the variant context JEXL expressions. However, since I have many VCF files, I like to automate...
View ArticleWhat is uBAM and why is it better than FASTQ for storing unmapped sequence data?
Most sequencing providers generate FASTQ files with the raw unmapped read sequences, so that is the most common form in which the data is input into the mapping step of the pre-processing pipeline....
View ArticlePer-base alignment qualities (BAQ) in the GATK
This article is out of date and no longer applicable. BAQs are no longer used in GATK. 1. Introduction The GATK provides an implementation of the Per-Base Alignment Qualities (BAQ) developed by Heng Li...
View Articlewhere I can download ContEst source code ?
HI, I wanna recompile contest module , but I can not find any independment ContEst module code . I checked gatk github page, but I find ContEst code is not standalone , it's just a part of GATK . so...
View ArticleError running CollectAlignmentSummaryMetrics on a bam generated from .maf file
Hello, Recently I run an alignment with LAST tool (http://last.cbrc.jp/ - fasta aligner for long reads alignment), it produces .maf file which I then converted to sam(with...
View ArticleI am not sure why HaplotypeCaller does not call my SNV mutation?
Hi Guys, I am using HaplotypeCaller to call mutations for some of patient samples. I know that at MSH2 intron5 near splicing site there is a point mutation in one of the sample. however, this is also a...
View ArticleVCF file malformed
Hi all. I am new in bioinformatics. Now I am trying to do a base re calibration and this is why I need a dbsnp.vcf with the known variants of the genome I am working. I downloaded the dbsnp file from...
View Articleunderstanding dcov option
Hi, Here is an excerpt of the docs of -dcov option: "For read-based traversals (ReadWalkers like BaseRecalibrator), it controls the maximum number of reads sharing the same alignment start position."...
View ArticleGATK HaplotypeCallerQUAL field changes while running per chromosome
Dear GATK Team, I ran GATK haplotypecaller (3.4) on one sample. In normal procedure . Whole bam file submitted to haplotype caller. 2 .In another way, BAM was splitted by chromosome ( using -L option)...
View ArticleWhich training sets / arguments should I use for running VQSR?
This document describes the resource datasets and arguments that we recommend for use in the two steps of VQSR (i.e. the successive application of VariantRecalibrator and ApplyRecalibration), based on...
View ArticleIlluminaBasecallsToSam Fails, trying to figure out why
Hello, I ran ExtractIlluminaBarcodes and came out with the following metrics file: METRICS CLASS picard.illumina.ExtractIlluminaBarcodes$BarcodeMetric BARCODE BARCODE_NAME LIBRARY_NAME READS PF_READS...
View ArticleWhy am I getting high GQ scores for doubtful calls?
Actually, there are several aspects to this. I am looking for de novos in trios and basically I have far too many, indicating incorrect calls with HaplotypeCaller. One example which highlights a number...
View Articlewhich population to choose with using contEst ?
HI, which population to choose when I do contEst ? I am in China . I checked contEst default value is "CEU" . but I do not know what the meanning of CEU ??can you give me some introduction about that ?...
View Articlehow to find any code about "Genotype" ?
HI, I read this code "htsjdk.variant.variantcontext.Genotype" , but I checked GATK source code , I can not find "Genotype" module . So could you please provide me with codes about...
View ArticleHow VQSR deals with multiallelic SNPs and Indel
Hi, May I know how VQSR deals with multiallelic SNPs and indels? How to classify them as pass or fail?
View ArticleFilterSamReads: java.lang.OutOfMemoryError: Java heap space
How to estimate the memory use (-Xmx?G) when using FilterSamReads? I always get the errors: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at...
View ArticleA question about using GenomeAnalysisTK to deal with ancient DNA data
Hi, Because my data is ancient DNA,the read depth is almost one.When I used HaplotypeCaller to make gvcf,one read is as follows: ''' REF:...
View Article