Hi everybody,
I have been using FastaAlternateReferenceMaker to place variants back into a reference. To attempt to mitigate reference bias and restrict results to only confidently-called sites, I performed an additional GATK run with EMIT_ALL_SITES, thinned the VCF down to only sites that were nocalls, and passed this VCF to FastaAlternateReferenceMaker via --snpmask. Since a variant - any variant - would be replaced as an "N," I was under the assumption this would work for nocalls too. Am I incorrect? Are positions with nocalls skipped over completely even though they are positions in the VCF? The before-and-after genomes are the same, suggesting something I didn't expect is happening behind the scenes.
Is there a way to make this work? Perhaps some way I could modify the VCF to make it work, even if it is a bit of a shoehorn? The BED files for these positions can be quite large (I am dealing with exome data, hence all the 'other' stuff will be masked), and I run into memory problems trying to give other applications the whole file. I could subset, if need be.
I'd appreciate any clarification!