What is the output of MuTect and how should I interpret it?

Please note that this article refers to the original standalone version of MuTect. A new version is now available within GATK (starting at GATK 3.5) under the name MuTect2. This new version is able to call both SNPs and indels. See the GATK version 3.5 release notes and the MuTect2 tool documentation for further details.

Overview

MuTect produces a lot of information that is spread across several different files. This document describes the most important outputs and how to interpret them. For a complete list of outputs and their description, please use the -help flag at the command line.

* Call-stats file

The main output which people typically work with is the "call-stats" file. It is an exhaustive report of all the metrics and statistics available about the calls made by MuTect and the filters that are applied internally by default. See further below for a more complete description of the call-stats output.

* VCF file of candidate mutations

Upon request, MuTect can output a summary VCF file containing the mutation candidates annotated with KEEP or REJECT in the FILTER field.

* Coverage / WIGGLE files

Also upon request, MuTect can output so-called "wiggle" files (in WIGGLE format) that contain useful information about the read coverage observed in the data. This format indicates for every base whether it is sufficiently covered in the tumor and normal to be sensitive enough to call mutations. We currently use cutoffs of at least 14 reads in the tumor and at least 8 in the normal (these cutoffs are applied after removing noisy reads in the preprocessing step). There are several different files that can be generated, containing e.g. overall coverage, just the tumor, just the normal, and so on.

More details about the call-stats file and how to use it

The call-stats output contains a lot of information that is intended to help with development, but that most users don't need to take into account in their analysis. Since this can be rather confusing, we recommend that you extract subsets of information from the call-states file according to your needs, rather than try to work with the whole thing.

Extracting subsets of data using `grep`

The most common subset you'll want to work with is the set of confident calls that were not rejected by MuTect's internal filters. An easy way to do this using basic Unix tools is to search for lines that don't contain the string REJECT:

grep -v REJECT <my.call_stats.txt>

You can also select subsets of sites that were filtered for specific reasons, in case you want to "rescue" those sites. This is the equivalent of disabling MuTect's internal filters, which is currently hard to do from command line.

Understanding the main statistics / fields

Here are the definitions of some of the most prominent outputs in the call-stats file:

contig: the contig location of this candidate
position: the 1-based position of this candidate on the given contig
ref_allele: the reference allele for this candidate
alt_allele: the mutant (alternate) allele for this candidate
tumor_name: name of the tumor as given on the command line, or extracted from the BAM
normal_name: name of the normal as given on the command line, or extracted from the BAM
score: for future development
dbsnp_site: is this a dbsnp site as defined by the dbsnp bitmask supplied to the caller
covered: was the site powered to detect a mutation (80% power for a 0.3 allelic fraction mutation)
power: tumor_power * normal_power
tumor_power: given the tumor sequencing depth, what is the power to detect a mutation at 0.3 allelic fraction
normal_power: given the normal sequencing depth, what power did we have to detect (and reject) this as a germline variant
total_pairs: total tumor and normal read depth which come from paired reads
improper_pairs: number of reads which have abnormal pairing (orientation and distance)
map_Q0_reads: total number of mapping quality zero reads in the tumor and normal at this locus
init_t_lod: deprecated
t_lod_fstar: CORE STATISTIC: Log of (likelihood tumor event is real / likelihood event is sequencing error )
tumor_f: allelic fraction of this candidated based on read counts
contaminant_fraction: estimate of contamination fraction used (supplied or defaulted)
contaminant_lod: log likelihood of ( event is contamination / event is sequencing error )
t_ref_count: count of reference alleles in tumor
t_alt_count: count of alternate alleles in tumor
t_ref_sum: sum of quality scores of reference alleles in tumor
t_alt_sum: sum of quality scores of alternate alleles in tumor
t_ins_count: count of insertion events at this locus in tumor
t_del_count: count of deletion events at this locus in tumor
normal_best_gt: most likely genotype in the normal
init_n_lod: log likelihood of ( normal being reference / normal being altered )
n_ref_count: count of reference alleles in normal
n_alt_count: count of alternate alleles in normal
n_ref_sum: sum of quality scores of reference alleles in normal
n_alt_sum: sum of quality scores of alternate alleles in normal
judgement: final judgement of site KEEP or REJECT (not enough evidence or artifact)

What is the output of MuTect and how should I interpret it?

Overview

* Call-stats file

* VCF file of candidate mutations

* Coverage / WIGGLE files

More details about the call-stats file and how to use it

Extracting subsets of data using `grep`

Understanding the main statistics / fields

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112

Overview

* Call-stats file

* VCF file of candidate mutations

* Coverage / WIGGLE files

More details about the call-stats file and how to use it

Extracting subsets of data using grep

Understanding the main statistics / fields

Trending Articles

Extracting subsets of data using `grep`