Hi there,
I am working with sequence capture data from a non model organism (based on a de novo genome). Our goal is to get the site frequency spectrum for use in demographic inference, so the number of singleton mutations is important to us.
I am using the recommended hard filters, and am losing 50,000 variants due to the QD < 2.0 filter. I wanted to get your advice to see if that filter is appropriate for sequence capture data, as I know the normal DP filters are not appropriate with capture data. My QD distribution looks very different from the QD distribution shown here.
When I use a straight QUAL < 30 filter, I get many more singletons in my SFS, some of which are probably false positives, but I am not sure what proportion. (Figure shows use of QUAL filter in pink, and QD outlined in blue).
Do you have any recommendations for adjusting the QD filter for use with sequence capture data, or QD distrubtions that look like mine?
Thanks so much for your help!
~ Annabel
Other info:
GATK version 3.7
Best Practices (though without VQSR since I am working with a de novo genome from a non-model organism and don't have good set of trusted SNPs)
Mean coverage: 25-35x