Quantcast
Viewing all articles
Browse latest Browse all 12345

How MuTect filters candidate mutations

Please note that this article refers to the original standalone version of MuTect. A new version is now available within GATK (starting at GATK 3.5) under the name MuTect2. This new version is able to call both SNPs and indels. See the GATK version 3.5 release notes and the MuTect2 tool documentation for further details.

Overview

This document describes the methodological underpinnings of the filters that MuTect applies by default to distinguish real mutations from sequencing artifacts and errors. Some of these filters are applied in all detection modes, while others are only applied in "High Confidence" detection mode.

Note that at the moment, there is no straightforward way to disable these filters. It is possible to disable each by passing parameter values that render the filters ineffective (e.g. set a value of zero for a filter that requires a minimum value of some quantity) but this has to be examined on a case-by-case basis. A more practical solution is to leave the filter parameters untouched, but instead perform some filtering on the CALLSTATS file using text processing functions (e.g. test for lines that have REJECT in only one of several columns).


Filters used in high-confidence mode

1. Proximal Gap

This filter removes false positives (FP) caused by nearby misaligned small indel events. MuTect will reject a candidate site if there are more than a given number of reads with insertions/deletions in an 11 base pair window centered on the candidate. The threshold value is controlled by the --gap_events_threshold.

In the CALLSTATS output file, the relevant columns are labeled t_ins_count and t_del_count.

2. Poor Mapping

This filter removes FPs caused by reads that are poorly mapped (typically due to sequence similarities between different portions of the genome). The filter uses two tests:

  • Reject candidate if it does not meet a given threshold for the fraction of reads that have a mapping quality of 0 in tumor and normal samples. The threshold value is controlled by --fraction_mapq_threshold.

  • Reject candidate if it does not have at least one observation of the mutant allele with a mapping quality that satisfies a given threshold. The threshold value is controlled by --required_maximum_alt_allele_mapping_quality_score.

In the CALLSTATS output file, the relevant columns are labeled total_reads and map_Q0_reads for the first test, and t_alt_max_mapq for the second test.

3. Strand Bias

This filter rejects FPs caused by context-specific sequencing where the vast majority of alternate alleles are seen in a single direction of reads. Candidates are rejected if strand-specific LOD is below a given threshold in a direction where the sensitivity to have passed that threshold is above a certain percentage. The LOD threshold value is controlled by --strand_artifact_lod and the percentage is controlled by --strand_artifact_power_threshold.

In the CALLSTATS output file, the relevant columns are labeled power_to_detect_negative_strand_artifact and t_lod_fstar_forward. There are also complementary columns labeled power_to_detect_positive_strand_artifact and t_lod_fstar_reverse.

4. Clustered Position

This filter rejects FPs caused by misalignments evidenced by the alternate alleles being clustered at a consistent distance from the start or end of the read alignment. Candidates are rejected if their median distance from the start/end of the read and median absolute deviation are lower or equal to given thresholds. The position from end of read threshold value is controlled by --pir_median_threshold and the deviation value is controlled by --pir_mad_threshold.

In the CALLSTATS output file, the relevant columns are labeled tumor_alt_fpir_median and tumor_alt_fpir_mad for the forward strand, and complementary columns are labeled tumor_alt_rpir_median and tumor_alt_rpir_mad for the reverse (note the name difference is fpir vs. rpir, for forward vs. reverse position in read).

5. Observed in Control

This filter rejects FPs in tumor data by looking at control data (typically from a matched normal) for evidence of the alternate allele that is above random sequencing error. Candidates are rejected if both the following conditions are met:

  • The number of observations of the alternate allele or the proportion of reads carrying the alternate allele is above a given threshold, controlled by --max_alt_alleles_in_normal_count and --max_alt_allele_in_normal_fraction.

  • The sum of quality scores is above a given threshold value, controlled by --max_alt_alleles_in_normal_qscore_sum.

In the CALLSTATS output file, the relevant columns are labeled n_alt_count, normal_f , and n_alt_sum.


Filters applied in all MuTect modes

1. Tumor and normal LOD scores

This filter rejects candidates with a tumor LOD score below a given threshold value, controlled by --tumor_lod, and similarly for a normal LOD score threshold controlled by --normal_lod_threshold.

In the CALLSTATS output file, the relevant columns are labeled t_lod_fstar and init_n_lod, respectively.

2. Possible contamination

This filter rejects candidates with potential cross-patient contamination, controlled by --fraction_contamination.

In the CALLSTATS output file, the relevant columns are labeled t_lod_fstar and contaminant_lod.

3. Normal LOD score and dbsnp status

If a candidate mutation is in dbsnp but is not in COSMIC, it may be a germline variant. In that case, the normal LOD threshold that the candidate must clear is raised to a value controlled by --dbsnp_normal_lod.

In the CALLSTATS output file, the relevant column is labeled init_n_lod.

4. Triallelic Site Filter

When the program is evaluating a site, it considers all possible alternate alleles as mutation candidates, and puts them through all the filters detailed above. If more than one candidate allele passes all filters, resulting in a proposed triallelic site, the site is rejected with the reason triallelic_site because it is extremely unlikely that this would really happen in a tumor sample.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>