VariantEval Evaluation Modules Glossary

Default modules:

CompOverlap: gives concordance metrics based on the overlap between the evaluation and comparison file
CountVariants: counts different types (SNP, insertion, complex, etc.) of variants present within your evaluation file and gives related metrics
IndelLengthHistogram: gives a table of values for plotting a histogram of indel lengths found in your evaluated variants.
IndelSummary: gives metrics related to insertions and deletions (count, multiallelic sites, het-hom ratios, etc.)
MultiallelicSummary: gives metrics relevant to multiallelic variant sites, including amount, ratio, and TiTv
TiTvVariantEvaluator: gives the number and ratio of transition and transversion variants for your evaluation file, comparison file, and ancestral alleles
ValidationReport: details the sensitivity and specificity of your callset, given follow-up validation assay data
VariantSummary: gives a summary of metrics related to SNPs and indels

Other available modules:

MendelianViolationEvaluator: detects and counts Mendelian violations, given data from parent samples.
PrintMissingComp: returns the number of variant sites present in your callset that were not found in the truth set.
ThetaVariantEvaluator: computes different estimates of theta based on variant sites and genotypes
MetricsCollection: includes all minimum metrics discussed in this article (link to follow; document in progress). Runs by default if CompOverlap, IndelSummary, TiTvVariantEvaluator, CountVariants, & MultiallelicSummary are run as well. (included in the nightly build for immediate use or in the next release of GATK)
_{* At the time of writing, the listed modules were present. To check modules present in your specific GATK version, use the -list command.}

General

Each table has a few columns of data that will be the same across multiple evaluation modules. To avoid listing them multiple times, they will be specified here

Example Output *

CompOverlap- In the above example, we see the first column is the CompOverlap. This first column will always be the name of the evaluation module you are currently viewing. IndelSummary will say "IndelSummary", CountVariants will say "CountVariants" and so on.
CompRod- shows which file is being compared to the eval for that row.
By default, this is dbsnp, but you can specify additional comparison files using -comp, and name them using :. E.g. -comp:name \path\to\file.vcf where name is the name you wish to specify for the CompRod column and \path\to\file.vcf is your comparison file. If left unnamed, these additional comparison files will default to "comp" in the CompRod column.
EvalRod- shows which file is being evaluated.
This is useful when specifying multiple eval files. They can be named using the : notation as above. When unnamed, they will default to "eval" in the EvalRod column.
JexlExpression- a Jexl query that was applied to the file. For details on Jexl expressions, please read about them here
Novelty- has three possible values; all, known, and novel. "Novel" includes anything seen exclusively in the eval that is not seen in the comp. "Known" includes anything seen in both the eval and the comp. "All" is the sum of "Novel" and "Known".
By default, the comp used to determine novelty is dbsnp. To change this, you must specify -knownName with the new comparison file you have passed in.

*Output from a rare variant association study with >1500 whole genome sequenced samples

CompOverlap

Example Output *

nEvalVariants- the number of variants in the eval file
novelSites- the number of variants in the eval considered to be novel in comparison to dbsnp (same as novel row of nEvalVariants column)
nVariantsAtComp- the number of variants present in eval that match the location of a variant in the comparison file (same as known row of nEvalVariants)
compRate- nVariantsAtComp divided by nEvalVariants
nConcordant- the number of variants present in eval that exactly match the genotype present in the comparison file
concordantRate- nConcordant divided by nVariantsAtComp

*Output from a rare variant association study with >1500 whole genome sequenced samples

CountVariants

Example Output *

nProcessedLoci- the number of loci iterated over in the reference file (also found in MultiallelicSummary)
nCalledLoci- the number of loci called in the eval file
nRefLoci- the number of loci in eval that matched the reference file
nVariantLoci- the number of loci in eval that did not match the reference file
variantRate- nVariantLoci divided by nProcessedLoci
variantRatePerBp- nProcessedLoci divided by nVariantLoci (a truncated integer)
nSNPs- the number of variants determined to be single-nucleotide polymorphisms
nMNPs- the number of variants determined to be multi-nucleotide polymorphisms
nInsertions- the number of variants determined to be insertions
nDeletions- the number of variants determined to be deletions
nComplex- the number of variants determined to be complex (both insertions and deletions)
nSymbolic- the number of variants determined to be symbolic
nMixed- the number of variants determined to be mixed (cannot be determined to be SNPs, MNPs, or indels)
nNoCalls- the number of sites at which there was no variant call made
nHets- the number of heterozygous loci
nHomRef- the number of homozygous reference loci
nHomVar- the number of homozygous variant loci
nSingletons- the number of variants determined to be singletons (occur only once)
nHomDerived- the number of homozygous derived variants; an ancestor had a variant at that site, but the descendant in question no longer has a variant at that site and is now homozygous reference.
heterozygosity- nHets divided by nProcessedLoci
heterozygosityPerBp- nProcessedLoci divided by nHets (a truncated integer)
hetHomRatio- nHets divided by nHomVar
indelRate- nInsertions plus nDeletions plus nComplex all divided by nProcessedLoci
indelRatePerBp- nProcessedLoci divided by the sum of nInsertions, nDeletions, and nComplex (a truncated integer)
insertionDeletionRatio- nInsertions divided by nDeletions

*Output from a rare variant association study with >1500 whole genome sequenced samples

IndelSummary

Example Output *

n_SNPs- the number of SNPs (multiallelic SNPs are counted once for each allele)
n_singleton_SNPs- the number of SNP singleton loci (SNPs seen only once)
n_indels- the number of indels (multiallelic indels are counted once for each allele)
n_singleton_indels- the number of indel singleton loci (indels seen only once)
n_indels_matching_gold_standard- the number of indel loci that match indels in the gold standard (must pass in a -gold parameter)
gold_standard_matching_rate- n_indels_matching_gold_standard divided by n_indels
n_multiallelic_indel_sites- the number of indel sites that are multiallelic
percent_of_sites_with_more_than_2_alleles- n_multiallelic_indel_sites divided by the total number of indel sites
SNP_to_indel_ratio- n_SNPs divided by n_indels
SNP_to_indel_ratio_for_singletons- n_singleton_SNPs divided by n_singleton_indels
n_novel_indels- number of indels considered to be novel in comparison to dbsnp (the novel row of the n_indels column gives the same information)
indel_novelty_rate- n_novel_indels divided by n_indels
n_insertions- the number of insertion variants
n_deletions- the number of deletion variants
insertion_to_deletion_ratio- n_insertions divided by n_deletions
n_large_deletions- number of deletions with a length greater than 10
n_large_insertions- number of insertions with a length greater than 10
insertion_to_deletion_ratio_for_large_indels- n_large_insertions divided by n_large_deletions
n_coding_indels_frameshifting- the number of indels within the coding regions of the genome which cause a frameshift
n_coding_indels_in_frame- the number of indels within the coding regions of the genome which do not cause a frameshift
frameshift_rate_for_coding_indels- n_coding_indels_frameshifting divided by the sum of n_coding_indels_frameshifting and n_coding_indels_in_frame
SNP_het_to_hom_ratio- the number of heterozygous SNPs divided by the number of homozygous variant SNPs
indel_het_to_hom_ratio- the number of heterozygous indels divided by the number of homozygous variant indels
ratio_of_1_and_2_to_3_bp_insertions- the sum of one and two base pair insertions divided by three base pair insertions
ratio_of_1_and_2_to_3_bp_deletions- the sum of one and two base pair deletions divided by three base pair deletions

*Output from a rare variant association study with >1500 whole genome sequenced samples

TiTvVariantEvaluator

Example Output *

nTi- number of transition variants in eval (A↔G or T↔C)
nTv- number of transversion variants in eval (A↔T or G↔C)
tiTvRatio- nTi divided by nTv
nTiInComp- number of transition variants present in the comparison file
nTvInComp- number of transversion variants present in the comparison file
TiTvRatioStandard- nTiInComp divided by nTvInComp
nTiDerived- number of transition variants derived from ancestral alleles
nTvDerived- number of transversion variants derived from ancestral alleles
tiTvDerivedRatio- nTiDerived divided by nTvDerived

*Output from a rare variant association study with >1500 whole genome sequenced samples

MultiallelicSummary

Example Output *

nProcessedLoci- number of loci iterated over in the reference file (also found in CountVariants)
nSNPs- number of SNPs (multiallelic SNPs are only counted once overall)
nMultiSNPs- number of multiallelic SNPs (again, only counted once per loci)
processedMultiSnpRatio- nMultiSNPs divided by nProcessedLoci
variantMultiSnpRatio- nMultiSNPs divided by nSNPs
nIndels- number of indels (multiallelic indels are only counted once overall)
nMultiIndels- number of multiallelic indels (again, only counted once per loci)
processedMultiIndelRatio- nMultiIndels divided by nProcessedLoci
variantMultiIndelRatio- nMultiIndels divided by nIndels
nTi- number of transition variants at multiallelic sites
nTv- number of transversion variants at multiallelic sites
TiTvRatio- nTi divided by nTv
knownSNPsPartial- the number of loci at which at least one allele in eval was found in the known comparison file (applies only to multiallelic sites)
knownSNPsComplete- the number of loci at which all alleles in eval were also found in the known comparison file (applies only to multiallelic sites)
SNPNoveltyRate- the sum of knownSNPsPartial and knownSNPsComplete divided by nMultiSNPs

*Output from a rare variant association study with >1500 whole genome sequenced samples

VariantEval Evaluation Modules Glossary

Table of Contents

Default modules:

Other available modules:

General

CompOverlap

CountVariants

IndelSummary

TiTvVariantEvaluator

MultiallelicSummary

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112