How MuTect identifies candidate somatic mutations

Please note that this article refers to the original standalone version of MuTect. A new version is now available within GATK (starting at GATK 3.5) under the name MuTect2. This new version is able to call both SNPs and indels. See the GATK version 3.5 release notes and the MuTect2 tool documentation for further details.

Overview

In a nutshell, the MuTect analysis consists of three steps:

Pre-processing the aligned reads in the tumor and normal sequencing data
Statistical analysis to identify sites that are likely to carry somatic mutations with high confidence
Post-processing of candidate somatic mutations

This document summarizes the key points of these three steps. For complete details, please see the 2013 publication in Nature Biotechnology:

Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnology (2013).doi:10.1038/nbt.2514

1. Pre-processing the aligned reads in the tumor and normal sequencing data

In this step we ignore reads with too many mismatches or very low quality scores since these represent noisy reads that introduce more noise than signal.

2. Statistical analysis to identify sites that are likely to carry somatic mutations with high confidence

The statistical analysis predicts a somatic mutation by using two Bayesian classifiers – the first aims to detect whether the tumor is non-reference at a given site and, for those sites that are found as non-reference, the second classifier makes sure the normal does not carry the variant allele. In practice the classification is performed by calculating a LOD score (log odds) and comparing it to a cutoff determined by the log ratio of prior probabilities of the considered events.

For the tumors we calculate:

$$ LOD_T = log_{10} \left ( \frac{ P( \text{observed data in tumor | site is mutated} ) } { P( \text{observed data in tumor | site is reference} ) } \right ) $$

And for the normal:

$$ LOD_N = log_{10} \left ( \frac{ P( \text{observed data in normal | site is reference} ) } { P( \text{observed data in normal | site is mutated} ) } \right ) $$

Since we expect somatic mutations to occur at a rate of ~1 per Mb, we require

$$ LOD_T > log_{10} (0.5 \times 10^{-6} ) \approx 6.3 $$

which guarantees that our false positive rate, due to noise in the tumor, is less than half of the somatic mutation rate.

In the normal, for sites that are not in dbSNP, we require

$$ LOD_N > log_{10} (0.5 \times 10^{-2} ) \approx 2.3 $$

since non-dbSNP germline variants occur roughly at a rate of 100 per Mb. This cutoff guarantees that the false positive somatic call rate, due to missing the variant in the normal, is also less than half the somatic mutation rate.

3. Post-processing of candidate somatic mutations

This step aims to eliminate artifacts of next-generation sequencing, short read alignment and hybrid capture. For example, sequence context can cause hallucinated alternate alleles but often only in a single direction. Therefore, we test that the alternate alleles supporting the mutations are observed in both directions.

Note on method validation

Most cancer genome studies at the Broad Institute have made use of MuTect and have validated the mutation calls as a part of their cancer biology papers, showing that MuTect has a very low false positive rate. A summary of validation rates from these papers are show below:

How MuTect identifies candidate somatic mutations

Overview

1. Pre-processing the aligned reads in the tumor and normal sequencing data

2. Statistical analysis to identify sites that are likely to carry somatic mutations with high confidence

3. Post-processing of candidate somatic mutations

Note on method validation

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112