I have experienced a variant detection issue with confusion. The png file attached is the result of exact same NextSeq experiment but the read extraction range is different.
NextSeq2_point.bam: bam is composed of the reads which cover chr16: 89100686 position only.
NextSeq2_region.bam: bam is composed of the reads which cover the region of chr16: 89100686 +-100bp.
On position chr16:89100686, I presume T>C should be detected, but HaplotypeCaller failed to detect the variant with NextSeq2_region.bam.
NextSeq2_point.vcf:
chr16 89100686 . T C,<NON_REF> 7397.77 . DP=199;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=716400.00 GT:AD:DP:GQ:PL:SB 1/1:0,199,0:199:99:7426,599,0,7426,599,7426:0,0,155,44
NextSeq2_region.vcf:
chr16 89100686 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:0,199:199:0:0,0,0
What causes the difference and why?
--- GATK Version (Docker latest)
Using GATK wrapper script /gatk/build/install/gatk/bin/gatk
Running:
/gatk/build/install/gatk/bin/gatk HaplotypeCaller --version
Version:4.0.1.2
---
--- Command used
gatk HaplotypeCaller -I /temp/NextSeq2_region.bam -O /temp/NextSeq2_region.vcf -R /temp/genome.fa -L /temp/only16.bed --debug true --output-mode EMIT_ALL_SITES --all-site-pls true --dont-trim-active-regions true --emit-ref-confidence BP_RESOLUTION
---
--- Genome Version: hg38
--- bed
chr16 89100681 89101347 NM_174917.4_cds_2_0_chr16_89100682_f 0 +
If you need the bams and vcfs, I can post them here.