Quantcast
Viewing all articles
Browse latest Browse all 12345

Poor VQSR filtering

I ran VQSR on my vcf file from joint genotyping. I used dnSNP as training. The plots generated during VQSR don't seem to separate the pos and neg very well. Below are the plots for one sample.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

I use --ts_filter_level 99.0 during recalibration. And this is an example of the applied score for example;

##FILTER=<ID=VQSRTrancheSNP99.90to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -39616.7976">
##FILTER=<ID=VQSRTrancheSNP99.90to100.00,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -39616.7976 <= x < -6.9367">

Of the 26 million SNPs, only 32,000 are filtered out by VQSR, so I am not sure if this is working.
I was wondering what would be the expert opinion looking at these plots. Are the VQSLOD scores usable?

To get an idea of the distribution of VQSLOD values, I plotted a histogram of around 10,000 scores sampled from the first 1 million variants in the vcf file. Shown for SNPs and INDELs separately.

Image may be NSFW.
Clik here to view.

SNPs

Image may be NSFW.
Clik here to view.

INDELs

It looks like there are three peaks. Any ideas on that? Could that be used for filtering?

Also, I am working on Zebrafish and not Human.
Thanks.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>