VariantRecalibrator tranche plots have a lot of false positives

Hello!

I am working with data from 122 human whole exomes, captured using SeqCap EZ Prime Exome. My software versions are GATK 3.8.0 and java 1.8.0_131.

After following the Best Practices guidelines, I have gotten tranche plots from VariantRecalibrator that show a high proportion of 'false positives' in my novel variants (resulting from a low Ti/Tv ratio). I can't find anything this extreme on the forum, and I'm wondering if I may be doing something wrong with my variant calling.

The command that produced the tranche plots is:

```
java -Xmx16000m -jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R hg38.fa \
-input SNP.vcf \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg38.vcf.gz \
-resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.hg38.vcf.gz \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg38.vcf.gz \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.hg38.vcf.gz \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff \
-mode SNP \
-recalFile SNP.recal \
-tranchesFile SNP.tranches \
-rscriptFile SNP.plots.R
```

As you can see in 'all_SNPs.pdf', something like 40% of the novel SNPs are estimated to be false positives. 'more_tranches.pdf' shows that lowering the truth threshold does not resolve this (though it does discard a ton of SNPs).

As an alternative, I did hard filtering based on the distributions of all my annotations in R. (They looked pretty normal except for QD, I think because of high depths--see 'QD.png' attached here, and QUAL by DP plots in the thread for Discussion 23514 [sorry, can't post links]).

```
java -Xmx16000m -jar GenomeAnalysisTK.jar \
-T VariantFiltration \
-R hg38.fa \
--variant SNP.vcf \
-o SNP.FILT.vcf \
--filterExpression "QD < 2.0 || FS > 60.0 || MQ < 55.0 || MQRankSum < -1.0 || ReadPosRankSum < -2.5 || SOR > 2.5 || DP < 500 || InbreedingCoeff < -0.1"\
--filterName "HARDFILTER"
```

I then ran VariantRacalibrator on the hard filtered variants to see what would happen. Hard filtering helps reduce the false positive estimates a little--see 'hard_filtered_SNPs.pdf'--but it does not really solve this problem.

I ran VariantEval on the filtered variants to get a better idea of what was going on, and found the following:

My Data Subset Ti/Tv
All SNPs 2.23
SNPs in dbSNP (68% of total) 2.65
novel SNPs (32% of total) 1.52

So, it seems like my SNPs that also appear in dbSNP are alright, but the novel ones are not trustworthy.

One obvious option is to just filter out any variant not found in an existing database. This is OK for my purposes, since I'm looking for effects of common variants. But it still gives me pause that my novel variants can't be trusted. Any ideas about what would lead to such low Ti/Tv in an exome datset? (Note, I used '-L PrimeExome.intervals -ip 100' at relevant steps.)

Thanks a lot!

VariantRecalibrator tranche plots have a lot of false positives

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List