In the following final VCF (produced by following the gVCF workflow) I get quite a number of positions reported as missing (1.2M out of ~24M bases). This of course isn't unexpected, however upon closer inspection I cannot see a reason why HC would call some of these positions as missing genotypes. Take for instance this 11bp extract from a gVCF.
Supercontig_1.1 613355 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:78,0:78:18:0,18,270
Supercontig_1.1 613356 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:77,0:77:9:0,9,135
Supercontig_1.1 613357 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:78,0:78:0:0,0,0
Supercontig_1.1 613358 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:79,0:79:0:0,0,0
Supercontig_1.1 613359 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:79,0:79:0:0,0,0
Supercontig_1.1 613360 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:78,0:78:0:0,0,0
Supercontig_1.1 613361 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:77,0:77:0:0,0,0
Supercontig_1.1 613362 . C CT,<NON_REF> 3242.73 . DP=82;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1,0;RAW_MQ=295200 GT:AD:DP:GQ:PGT:PID:PL:SB 1/1:0,74,0:74:99:0|1:613362_C_CT:3280,223,0,3280,223,3280:0,0,44,30
Supercontig_1.1 613363 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:4,72:76:0:0,0,0
Supercontig_1.1 613364 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:76,0:76:99:0,120,1800
Supercontig_1.1 613365 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:76,0:76:99:0,120,1800
Here is the subsequent 11bp stretch after running the GenotypeGVCFs on the gVCF:
Supercontig_1.1 613355 . A . . PASS AN=2;DP=78;VariantType=NO_VARIATION GT:AD:DP:RGQ 0/0:78:78:18
Supercontig_1.1 613356 . G . . PASS AN=2;DP=77;VariantType=NO_VARIATION GT:AD:DP:RGQ 0/0:77:77:9
Supercontig_1.1 613357 . G . . PASS DP=78;VariantType=NO_VARIATION GT:AD:DP:RGQ ./.:78:78:0
Supercontig_1.1 613358 . T . . PASS DP=79;VariantType=NO_VARIATION GT:AD:DP:RGQ ./.:79:79:0
Supercontig_1.1 613359 . T . . PASS DP=79;VariantType=NO_VARIATION GT:AD:DP:RGQ ./.:79:79:0
Supercontig_1.1 613360 . T . . PASS DP=78;VariantType=NO_VARIATION GT:AD:DP:RGQ ./.:78:78:0
Supercontig_1.1 613361 . G . . PASS DP=77;VariantType=NO_VARIATION GT:AD:DP:RGQ ./.:77:77:0
Supercontig_1.1 613362 . C CT 3242.73 PASS AC=2;AF=1;AN=2;DP=82;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=30.85;SOR=1.134;VariantType=INSERTION.NumRepetitions_3.EventLength_1.RepeatExpansion_T GT:AD:DP:GQ:PGT:PID:PL 1/1:0,74:74:99:1|1:613362_C_CT:3280,223,0
Supercontig_1.1 613363 . T . . PASS DP=76;VariantType=NO_VARIATION GT:AD:DP:RGQ ./.:4:76:0
Supercontig_1.1 613364 . T . . PASS AN=2;DP=76;VariantType=NO_VARIATION GT:AD:DP:RGQ 0/0:76:76:99
Supercontig_1.1 613365 . T . . PASS AN=2;DP=76;VariantType=NO_VARIATION GT:AD:DP:RGQ 0/0:76:76:99
Positions 613,357-61 are all assigned a missing genotype (which I assume is because the genotype likelihoods for these positions are all equally likely according to HC). However, examining the raw bam output, I can see that ALL the reads covering these positions are 100% hom-ref
, and this is also the case when examining the BAMOUT from HC. Could anyone explain why I get these no-calls which appear to me to be erroneous? All the mapping qualities are very high as are the base qualities.