Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Rationale behind MuTect2

$
0
0

Hi !

I am trying to use MuTect2 on RNAseq data, trying to detect somatic mutation.

I had a few interrogations about the formulas used by the program. I went through the original MuTect publication (Cibulskis et al. 2013) and found this part about dbSNPs :

" There are ~30×10e6 sites known to be variant in the human population according to dbSNP release 134, which is ~1000 variants/megabase. A given individual typically has ~3×10e6 variants in their genome, 95% of which fall on dbSNP sites. Therefore we expect ~50 variants/mb not at dbSNP sites, i.e. P(germline| non-dbSNP site) = 5×10e−5 and therefore we use θN|non-dbSNP site = 2.2. At dbSNP sites, however, we expect 95% of the ~3×10e6 variants to occur in the 30×10e6 sites in the dbSNP database, yielding P(germline| dbSNP site) = 0.095 hence θN|dbSNP site = 5.5."

But it appears that nowadays, the last dbSNP release (147) contains 150mio of variants (5 times more). So I think this is changing quit a lot the probabilities, no? I was wondering if the values mentioned above were changed with MuTect2 in the newer versions ? Or maybe if the program adapt itself to the dbSNP database by counting the number of variant in the dbSNP file to measure this probability ?

I had another question : I wanted to know if there is any publication for MuTect2 that could explain the rationale behind the indel detection ?

Thanks a lot !

Alexandre Coudray


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>