Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

readbackedphasing (HaplotypeCaller) outputs much more 0|1 then 1|0, why?

$
0
0

Hi!

With the aim of phasing haplotype from SNPs of a single individual, I have used HaplotypeCaller which performes ReadBackedPhasing automatically (accuracy of SNP calling is beyond the question). However I observed much more 0|1 (98%, among all phased heterozygous SNPs) then 1|0 (2%).

What I don't understand is that as the reference is built from a mixing of diploid genome, when a output haplotype in .vcf start with 0|1, the next SNP should by chance have 50% of probability to be 0|1 and 50% to be 1|0. In another word, because the reference is unphased haplotype, then when I phase SNPs against such reference, I should have similar amount 0|1s and 1|0s.

For example, in any phased haplotype containg 2 SNPs, for the 1st SNP it always starts with 0|1. For the 2nd SNP, I expect to have similar amount of 0|1 and 1|0. But I have much more 0|1 then 1|0.

I have tried datasets from 5 different species and multiple individuals, including human, birds, and fish. The results are very similar.

I think I may have some misunderstanding about readbackedphasing. Can anyone help me with that?

Thanks.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>