Hi,
I have used VQSR for a set of 308 individuals from targeted sequencing array (21MB) and the while looking at the plots, I was not sure if VQSR has worked properly. I am running VariantEval right now but I am afraid that I am not filtering out a lot. After VQSR, removal of monomorphic positions, positions out of AB and HWE (10^-6) I am left with 2,9 million variants. Do you think VQSR has worked?
[code : -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $hapmap \
-resource:omni,known=false,training=true,truth=true,prior=12.0 $omni \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 $snphc \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 $dbsnp \
-an QD \
-an FS \
-an MQRankSum \
-an ReadPosRankSum \
-mode SNP \
-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
-recalFile recalibrate_SNP.recal \
-tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R --maxGaussians 4
Use recalibration model on snp call set
java -Xmx7g -jar /sw/apps/bioinfo/GATK/3.5.0/GenomeAnalysisTK.jar -T ApplyRecalibration -R $ref \
--input preVQSR_180317.vcf -mode SNP \
--ts_filter_level 99.5 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -o VQSR_SNP_raw_indels.vcf ]