Hi GATK team!
I'm having trouble precisely understanding the ModelSegments
hets output when ran on a tumor sample provided both a tumor and normal AllelicCounts
are given.
The documentation reads:
If the matched normal is available, its allelic counts will be used to genotype the sites, and we will simply assume these genotypes are the same in the case sample. (This can be critical, for example, for determining sites with loss of heterozygosity in high purity case samples; such sites will be genotyped as homozygous if the matched-normal sample is not available.)
If this were truly the case then why:
1. Is a different number of variants (and not 1:1 exactly overlapping) output to hets.tsv
and hets.normal.tsv
2. If I roughly quantify variant allele fractions in the hets.normal.tsv
file, a large portion of them are far away from 0.5
Both these observations seem to contradict what the documentation states. Can someone explain the difference and similarities between the hets.tsv
and hets.normal.tsv
output file in a way other than stated in the documentation because I'm not understanding this explanation.