Hello, I ran PhaseByTransmission (GATK ver 3.6) on trios to get a list of Mendelian violations. In the first run, we used BAMs at average coverage of ~100x and in the second run, we used down-sampled BAMs (created with Picard) to get an average coverage of 20x.
When I compared results, I noticed cases (example below with POS changed) where genotype call in one of the parents was converted from hom-ref to het in PhaseByTransmission output VCF and therefore these were not identified as MIEs anymore. The number of reads supporting REF and ALT alleles are switched in the output AD tag. I see the same calls with DeNovoPrior at default or 1.0E-6. I'm trying to understand what's causing this and would appreciate any feedback.
Original BAM
Input VCF:
1 100 . A T 962.13 . AC=1;AF=0.167;AN=6;BaseQRankSum=0.077;ClippingRankSum=0.000;DP=218;ExcessHet=3.0103;FS=2.261;MLEAC=1;MLEAF=0.167;MQ=60.00;MQRankSum=0.000;QD=11.45;ReadPosRankSum=2.468;SOR=0.905 GT:AD:DP:GQ:PL 0/0:75,0:75:99:0,225,2547 0/1:46,38:84:99:993,0,1300 0/0:58,0:58:99:0,174,1929
PhaseByTransmission output VCF:
1 100 . A T 962.13 . AC=1;AF=0.167;AN=6;BaseQRankSum=0.077;ClippingRankSum=0.000;DP=218;ExcessHet=3.0103;FS=2.261;MLEAC=1;MLEAF=0.167;MQ=60.00;MQRankSum=0.000;QD=11.45;ReadPosRankSum=2.468;SOR=0.905 GT:AD:DP:GQ:PL:TP 0/0:75,0:75:99:0,225,2547:93 0/1:46,38:84:99:993,0,1300:93 0/0:58,0:58:99:0,174,1929:93
Down-sampled BAM
Input VCF:
1 100 . A T 137.13 . AC=1;AF=0.167;AN=6;BaseQRankSum=-1.111;ClippingRankSum=0.000;DP=55;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.167;MQ=60.00;MQRankSum=0.000;QD=7.62;ReadPosRankSum=1.634;SOR=0.675 GT:AD:DP:GQ:PL 0/0:17,0:17:51:0,51,583 0/1:11,7:18:99:168,0,309 0/0:20,0:20:60:0,60,680
PhaseByTransmission output VCF:
1 100 . A T 137.13 . AC=1;AF=0.167;AN=6;BaseQRankSum=-1.111;ClippingRankSum=0.000;DP=55;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.167;MQ=60.00;MQRankSum=0.000;QD=7.62;ReadPosRankSum=1.634;SOR=0.675 GT:AD:DP:GQ:PL:TP 1|0:17,0:17:0:0,51,583:9 1|0:11,7:18:99:168,0,309:9 0|0:20,0:20:60:0,60,680:9
Thanks,
Prachi