Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Is it safe to call variants per chromosome when using HaplotypeCaller?

$
0
0

Hi!

I'm calling variants in a cohort of samples following the best practices recommendations (https://gatkforums.broadinstitute.org/gatk/discussion/3893/calling-variants-on-cohorts-of-samples-using-the-haplotypecaller-in-gvcf-mode). When I get to run HaplotypeCaller, the estimated running time escalates to more than a month per sample, which is excessive. A way arround this is to call variants one chromosome/scaffold at a time using the L flag:

java -jar GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R reference.fasta \
-I sample_realigned.bam \
-L $chr:1+ ###$chr is being passed to the script using a for loop that loops through all the chromosomes in my file \
--emitRefConfidence GVCF \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-o sample$1.g.vcf

...and then concatenate the files per sample using CatVariants. This way I can process each sample in less than a day. Later, I run GenotypeGVCFs on all samples together and get my vcf ready for filtering. My question is: Is it safe to do this? Am I affecting HaplotypeCaller capacity to call variants by separating my dataset in many small subdatasets and then combining them again?

Thanks!


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>