Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Genotype set to missing with lots of hom ref reads

$
0
0

In the following final VCF (produced by following the gVCF workflow) I get quite a number of positions reported as missing (1.2M out of ~24M bases). This of course isn't unexpected, however upon closer inspection I cannot see a reason why HC would call some of these positions as missing genotypes. Take for instance this 11bp extract from a gVCF.

Supercontig_1.1 613355  .   A   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:78,0:78:18:0,18,270
Supercontig_1.1 613356  .   G   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:77,0:77:9:0,9,135
Supercontig_1.1 613357  .   G   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:78,0:78:0:0,0,0
Supercontig_1.1 613358  .   T   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:79,0:79:0:0,0,0
Supercontig_1.1 613359  .   T   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:79,0:79:0:0,0,0
Supercontig_1.1 613360  .   T   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:78,0:78:0:0,0,0
Supercontig_1.1 613361  .   G   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:77,0:77:0:0,0,0
Supercontig_1.1 613362  .   C   CT,<NON_REF>    3242.73 .   DP=82;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1,0;RAW_MQ=295200    GT:AD:DP:GQ:PGT:PID:PL:SB   1/1:0,74,0:74:99:0|1:613362_C_CT:3280,223,0,3280,223,3280:0,0,44,30
Supercontig_1.1 613363  .   T   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:4,72:76:0:0,0,0
Supercontig_1.1 613364  .   T   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:76,0:76:99:0,120,1800
Supercontig_1.1 613365  .   T   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:76,0:76:99:0,120,1800

Here is the subsequent 11bp stretch after running the GenotypeGVCFs on the gVCF:

Supercontig_1.1 613355  .   A   .   .   PASS    AN=2;DP=78;VariantType=NO_VARIATION GT:AD:DP:RGQ    0/0:78:78:18
Supercontig_1.1 613356  .   G   .   .   PASS    AN=2;DP=77;VariantType=NO_VARIATION GT:AD:DP:RGQ    0/0:77:77:9
Supercontig_1.1 613357  .   G   .   .   PASS    DP=78;VariantType=NO_VARIATION  GT:AD:DP:RGQ    ./.:78:78:0
Supercontig_1.1 613358  .   T   .   .   PASS    DP=79;VariantType=NO_VARIATION  GT:AD:DP:RGQ    ./.:79:79:0
Supercontig_1.1 613359  .   T   .   .   PASS    DP=79;VariantType=NO_VARIATION  GT:AD:DP:RGQ    ./.:79:79:0
Supercontig_1.1 613360  .   T   .   .   PASS    DP=78;VariantType=NO_VARIATION  GT:AD:DP:RGQ    ./.:78:78:0
Supercontig_1.1 613361  .   G   .   .   PASS    DP=77;VariantType=NO_VARIATION  GT:AD:DP:RGQ    ./.:77:77:0
Supercontig_1.1 613362  .   C   CT  3242.73 PASS    AC=2;AF=1;AN=2;DP=82;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=30.85;SOR=1.134;VariantType=INSERTION.NumRepetitions_3.EventLength_1.RepeatExpansion_T  GT:AD:DP:GQ:PGT:PID:PL  1/1:0,74:74:99:1|1:613362_C_CT:3280,223,0
Supercontig_1.1 613363  .   T   .   .   PASS    DP=76;VariantType=NO_VARIATION  GT:AD:DP:RGQ    ./.:4:76:0
Supercontig_1.1 613364  .   T   .   .   PASS    AN=2;DP=76;VariantType=NO_VARIATION GT:AD:DP:RGQ    0/0:76:76:99
Supercontig_1.1 613365  .   T   .   .   PASS    AN=2;DP=76;VariantType=NO_VARIATION GT:AD:DP:RGQ    0/0:76:76:99

Positions 613,357-61 are all assigned a missing genotype (which I assume is because the genotype likelihoods for these positions are all equally likely according to HC). However, examining the raw bam output, I can see that ALL the reads covering these positions are 100% hom-ref, and this is also the case when examining the BAMOUT from HC. Could anyone explain why I get these no-calls which appear to me to be erroneous? All the mapping qualities are very high as are the base qualities.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>