Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Running genotypeGVCFs with ~4000 human exome data: stuck on "ProgressMeter - Starting"

$
0
0

Hello,

I am running genotypeGVCFs with ~4000 human exome data. To speed up the process, I have splited exome.interval_list into sub_interval_list which one interval file contains ~100kb regions. Then I submitted the genotypeGVCFs jobs in parallel for each sub_interval_list. e.g.

java -Xmx32g -jar /GATK/3.6/jar-bin/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 2 -L /home/jjduan/scatter_interval_list/interval_list.sub000000.interval_list -D /home/jjduan/ref_b37/dbsnp_138.b37.vcf -R /home/jjduan/ref_b37/human_g1k_v37.fasta --variant /home/jjduan/mergedGVCF/chr_19_mergedGVCF.list -o /home/jjduan/genotypedVCF/chr_19_sub000000.vcf

java -Xmx32g -jar /GATK/3.6/jar-bin/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 2 -L /home/jjduan/scatter_interval_list/interval_list.sub000001.interval_list -D /home/jjduan/ref_b37/dbsnp_138.b37.vcf -R /home/jjduan/ref_b37/human_g1k_v37.fasta --variant /home/jjduan/mergedGVCF/chr_19_mergedGVCF.list -o /home/jjduan/genotypedVCF/chr_19_sub000001.vcf

...

However, I kept receiving "ProgressMeter - Starting" for hours without any variants outputed.

INFO  00:09:31,580 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  00:09:31,581 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
WARN  00:09:32,292 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bi
WARN  00:09:32,295 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bi
INFO  00:09:32,295 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant f
INFO  00:10:01,605 ProgressMeter -        Starting         0.0    30.0 s      49.6 w      100.0%    30.0 s       0.0 s
INFO  00:10:31,606 ProgressMeter -        Starting         0.0    60.0 s      99.2 w      100.0%    60.0 s       0.0 s
INFO  00:11:01,608 ProgressMeter -        Starting         0.0    90.0 s     148.9 w      100.0%    90.0 s       0.0 s
INFO  00:11:31,611 ProgressMeter -        Starting         0.0   120.0 s     198.5 w      100.0%   120.0 s       0.0 s
INFO  00:12:01,613 ProgressMeter -        Starting         0.0     2.5 m     248.1 w      100.0%     2.5 m       0.0 s

I have read this thread and noticed this happens for reference genome with millions of contigs. But my data is human with much fewer contigs, so I would not think they are the same cases.

I know WDL/cromwell can support scatter/gather method to speed up. However, as I understand, the principle of the scatter/gather is the same as what I did here. So even using WDL, the parallelizable jobs are still facing the same stuck situation. Is that right?

Is there anything else I can do to get this to run at all, or faster, or just wait?

Thanks a lot for any inputs!

Best,
Jinjie


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>