Hi, I have noticed some inconsistency between the read depth reported from the DepthOfCoverage tool, and the DP fields in VCF. I am wondering if anyone can help me understanding how it works. I am running GATK suite version 3.6, and following the best of practice guideline.
I ran the DepthOfCoverage tool on the raw bam file (right after alignment, before all the pre-processing steps). Here's the read depth reported for a few loci:
chr4:980932 161 161.00 161
chr11:71146691 1383 1383.00 1383
chr17:3397702 453 453.00 453
chrX:31496350 561 561.00 561
chrX:31983162 325 325.00 325
I ran DepthOfCoverage again on the bam file after removing duplicate reads, indel recalibration, and base recalibration. Here's the read depth for the same set of loci:
chr4:980932 141 141.00 141
chr11:71146691 793 793.00 793
chr17:3397702 310 310.00 310
chrX:31496350 364 364.00 364
chrX:31983162 201 201.00 201
It all makes sense so far, as the read depth decreases after all the pre-processing steps. However, when I run the HaplotypeCaller using the above bam file, read depth increases for some of the loci.
chr4 980932 DP=139; GT:AD:DP:GQ:PL 0/1:63,76:139:99:2358,0,1748
chr11 71146691 DP=796; GT:AD:DP:GQ:PL 1/1:0,795:795:99:27795,2395,0
chr17 3397702 DP=309; GT:AD:DP:GQ:PL 0/1:174,133:307:99:3879,0,5360
chrX 31496350 DP=370;GT:AD:DP:GQ:PL 1/1:0,369:369:99:13554,1108,0
chrX 31983162 DP=214;GT:AD:DP:GQ:PL 0/1:113,101:214:99:3052,0,3074
It is expected that the second DP in the vcf file being less than the first DP, since some of the reads are filtered based on the MQ threshold or bad mates. However, it is confusing to me that the first DP is INCONSISTENT (sometimes higher, sometimes lower) than what we observed in the input BAM file. Can someone explain how the HaploytpeCaller "add more reads" or "remove reads" to a locus?
Thanks!!
Chelsea