I have been trying to speed up HaplotypeCaller by running multiple instances on the same bam file, but with different intervals provided to each instance through the -L option. We are wrapping our pipeline in a snakemake workflow. Our DAG produced by snakemake indicates that each instance should be able to run in parallel, however when checking through logs these instances are being queued in tandem, rather than in parallel.
I'm not sure if this is a snakemake problem, since other steps are correctly run in parallel. One theory we have is that the bam and reference are being blocked from use until HaplotypeCaller reaches completion for it's given interval, then queues up the next step after the files are unblocked.
In short, does GATK HaplotypeCaller block input files from being used while it is running?
Thanks.