Hi,
I ran haplotypecaller on a bunch of samples using the following commands:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -drf DuplicateRead -R hg19.fa -I SAMPLE.bam -o SAMPLE.g.vcf -L target_region.bed -ERC GVCF
java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R hg19.fa -V SAMPLE.g.vcf -o SAMPLE.hc.vcf
For many variants, DP is listed to be one read less than it actually is. I load the bam file in IGV and count the reads manually (also appears when I hover over the bar plot). Moreover, the correct depth is listed by the output of the DepthOfCoverage tool:
java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -drf DuplicateRead -R hg19.fa -I SAMPLE.bam --omitDepthOutputAtEachBase -o SAMPLE.coverage
Here is an example from hc.vcf:
chrX 49119876 . T C 531.77 . AC=2;AF=1.00;AN=2;DP=19;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.99;SOR=1.022 GT:AD:DP:GQ:PL 1/1:0,19:19:57:560,57,0
and from DepthOfCoverage:
chrX:49119876 20 20.00 20 20.00 21 21 21 100.0
So, depth matches between DepthOfCoverage and bam+IGV (DP=20), but it is one less in hc.vcf (DP=19).
Has anybody else seen this issue or know how to fix it? This is giving me problems for the variants that are right at my threshold.
Thanks a lot in advance!