Quantcast
Viewing all articles
Browse latest Browse all 12345

I get very different MQ values when using GVCF vs BP_RESOLUTION

Hello! I had a question about the difference between using HaplotypeCaller's --emitRefConfidence GVCF vs BP_RESOLUTION. Maybe the answer is obvious or in the forum somewhere already but I couldn't spot it...

First, some context: I'm working with GATK v. 3.5.0 in a haploid organism. I have 34 samples, from which 5 are very similar to the reference (they are backcrosses) while the rest are strains from a wild population. Originally I used --emitRefConfidence GVCF followed by GenotypeGVCF. While checking the output VCF file, I realized that the five backcrosses had a much lower DP in average than the other samples (but this doesn't make sense due to difference in reads numbers or anything like that, since they were run in the same lane, etc). I assume this happened because there are long tracks without any variant compare to the reference in those samples, and the GVCF blocks end up assigning a lower depth for a great amount of sites in those samples compare to the much more polymorphic ones. In any case, I figured I could just get all sites using BP_RESOLUTION so to obtain the "true" DP values per site. However, when I tried to do that, the resulting VCF file had very low MQ values! Can you explain why this happened?

This is the original file with --emitRefConfidence GVCF:

$ bcftools view -H 34snps.vcf | head -n3 | cut -f1-8
chromosome_1    57  .   A   G   309.4   .   AC=4;AF=0.235;AN=17;DP=582;FS=0;MLEAC=4;MLEAF=0.235;MQ=40;QD=34.24;SOR=2.303
chromosome_1    81  .   G   A   84.49   .   AC=2;AF=0.065;AN=31;DP=603;FS=0;MLEAC=2;MLEAF=0.065;MQ=44.44;QD=30.63;SOR=2.833
chromosome_1    88  .   T   C   190.75  .   AC=1;AF=0.091;AN=11;BaseQRankSum=-0.762;ClippingRankSum=0.762;DP=660;FS=7.782;MLEAC=1;MLEAF=0.091;MQ=29.53;MQRankSum=-1.179;QD=21.19;ReadPosRankSum=-1.666;SOR=1.414

And this is with --emitRefConfidence BP_RESOLUTION:

$ bcftools view -H 34allgenome_snps.vcf | head -n3 | cut -f1-8
chromosome_1    57  .   A   G   307.28  .   AC=4;AF=0.211;AN=19;DP=602;FS=0;MLEAC=4;MLEAF=0.211;MQ=8.23;QD=34.24;SOR=2.204
chromosome_1    81  .   G   A   84.49   .   AC=2;AF=0.065;AN=31;DP=750;FS=0;MLEAC=2;MLEAF=0.065;MQ=5.53;QD=30.63;SOR=2.833
chromosome_1    88  .   T   C   190.75  .   AC=1;AF=0.091;AN=11;BaseQRankSum=-1.179;ClippingRankSum=0.762;DP=796;FS=7.782;MLEAC=1;MLEAF=0.091;MQ=4.8;MQRankSum=-1.179;QD=21.19;ReadPosRankSum=-1.666;SOR=1.414

I find it particularly strange since the mapping quality of the backcrosses should in fact be slightly better in average (around 59 for the original BAM file) than the other more polymorphic samples (around 58)...

Thank you very much!


Viewing all articles
Browse latest Browse all 12345

Trending Articles