Hi, first of all, thanks for GATK! Image may be NSFW.
Clik here to view.
I am using MuTect2 and it works well overall. In one of our samples, however, we know that a deletion exists at 1:23559238. It is also clearly visible in IGV for the preprocessed input BAM file (296/296 reads have C and 70 DEL). Unfortunately, it is not detected by MuTect2.
In the --m2debug log, the best detected haplotype still contains and correctly discovers this deletion, annotated as [GC*, G] for 1:23559237:
INFO 14:06:48,326 EventMap - === Best Haplotypes ===
INFO 14:06:48,326 EventMap - CTGCAGGTCGAGAATGTAGTCGATGACGCGCTGTAGGATTTCCACCTGGCAAAGTGAGTGCCTCTCGGGACTCCGGGTACCAGTTCCCGCAGGCGGGAGGAGCAGTGGTTCATGTCGTCCAGCAAGCTCAGCGGCTCCTCAGCTGCCGGGCCCTTCCCT
INFO 14:06:48,326 EventMap - > Cigar = 54M1D105M
INFO 14:06:48,327 EventMap - >> Events = EventMap{1:23559234-23559234 [T*, A],1:23559237-23559238 [GC*, G],1:23559284-23559284 [T*, G],}
One base before the deletion, at 1:23559237, there is another variant (IGV: 262G+93A=355reads). The fourth-best haplotype contains this event:INFO 14:06:48,328 EventMap - CTGCAGGTCGAGAATGTAGTCGATGACGCGCTGTAGGATTTCCACCTGGCTAAACTGAGTGCCTCTCGGGACTCCGGGTACCAGTTCCCGCAGGCGGGAGGAGCAGTGGTTCATGTCGTCCAGCAAGCTCAGCGGCTCCTCAGCTGCCGGGCCCTTCCCT
INFO 14:06:48,329 EventMap - > Cigar = 160M
INFO 14:06:48,329 EventMap - >> Events = EventMap{1:23559237-23559237 [G*, A],1:23559284-23559284 [T*, G],}
Genotyping at this position results in two allelic fractions and two TLODs that are string-concatenated in the log (looks a bit weird as there is no space or other separator between the numbers...):INFO 14:06:48,413 SomaticGenotypingEngine - Genotyping event at 23559237 with alleles = [GC*, G, AC]
(...)
INFO 14:06:48,493 SomaticGenotypingEngine - Calculated allelic fraction at 23559237 = 0.131147540983606560.15873015873015872
INFO 14:06:48,494 SomaticGenotypingEngine - Tumor LOD at 23559237 = 161.8743235881842157.0755211370182
In the final VCF file, only the second TLOD=157 is output at 23559237; the TLOD=161 (presumably the deletion) is lost:1 23559007 rs11574 T C . PASS DB;ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=0.00;TLOD=1161.75 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:11,497:0.975:253:244:0.509:340,16766:1:10
1 23559234 . T A . PASS ECNT=2;HCNT=3;MAX_ED=3;MIN_ED=3;NLOD=0.00;TLOD=167.66 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:270,72:0.211:40:32:0.556:8577,2288:133:137
1 23559237 . G A . clustered_events;triallelic_site ECNT=2;HCNT=3;MAX_ED=3;MIN_ED=3;NLOD=0.00;TLOD=157.08 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:180,96:0.159:43:53:.:6450,3008:93:87
I already tried --artifact_detection_mode to exclude any filter problems. Maybe it is a problem that the deletion at 23559238 is anchored at 23559237 and another variant is output at this position? (Is there a parameter to allow more than one VCF entry per start base pair?)