Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Subset CombinedGVCF files?

$
0
0

Hello,

We're in the middle a project where we are iteratively generating variant data on what is a large and growing cohort. We we're adding data to the existing cohort in batches of ~96. For each batch, we run GATK to make gVCFs, and then do a single CombineGVCFs on those 96 (or any passing QC so far). We then take all of the CombinedGVCF files to date and run GenotypeGVCFs. While we're currently keeping the CombinedGVCF files at about 96 samples/ea, I expect we're going to start wanting to combine more.

Our problem is that over time we sometimes find specific samples that we want to drop. When this happens, we end up dropping that batch's CombinedGVCF file and remaking without that specific sample. It would be a whole lot nicer if we could do more of a SelectVariants-style approach where we subset with either a whitelist or blacklist of animals. Can I safely do subset operations like this on a CombinedGVCF to drop specific samples?

Thanks.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>