Hi, A colleague was experiencing a very long run-time for a GATK HaplotypeCaller run and asked me to look at it. I noticed that although it had been running for about 5 days, it hadn't even created a vcf file and showed no progress on the stderr. The last line after 5 days of runtime was:
INFO 13:00:27,433 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
I extracted one of the scaffolds from the assembly (fasta and bam), created indexes for them and tested GATK on the minimal dataset and I could also replicate the problem, i.e. no progress and no output . I've attached the bam, fasta, indexes and command line and was wondering if you could identify why GATK seems to stall before analysing the bam file. The bam file is only a few MB, so I'd expect GATK to only take a few minutes to create output, but this is obviously not the case.
Many thanks,
Graham