Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

--snpmask VCF format and nocalls

$
0
0

Hi everybody,

I have been using FastaAlternateReferenceMaker to place variants back into a reference. To attempt to mitigate reference bias and restrict results to only confidently-called sites, I performed an additional GATK run with EMIT_ALL_SITES, thinned the VCF down to only sites that were nocalls, and passed this VCF to FastaAlternateReferenceMaker via --snpmask. Since a variant - any variant - would be replaced as an "N," I was under the assumption this would work for nocalls too. Am I incorrect? Are positions with nocalls skipped over completely even though they are positions in the VCF? The before-and-after genomes are the same, suggesting something I didn't expect is happening behind the scenes.

Is there a way to make this work? Perhaps some way I could modify the VCF to make it work, even if it is a bit of a shoehorn? The BED files for these positions can be quite large (I am dealing with exome data, hence all the 'other' stuff will be masked), and I run into memory problems trying to give other applications the whole file. I could subset, if need be.

I'd appreciate any clarification!


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>