I have been following the "HAPLOTYPECALLER GVCF - Exome/Panel + Whole Genome" best practices on a set of 13 whole-genome sequences from a non-model organism (~10x coverage). I have successfully created my per-sample g.vcf files, but when I have attempted to perform the Joint Genotyping step (without merging files) it has failed to complete within the maximum time that I can allot to a job on the computing cluster I am using (10 days). This seems unusual as the step is noted as being fast! I have searched the forum and tried a number of different approaches based on questions which seem to address similar problems, but have so far not had any success. There are no program errors in the output file, it just slows down and then times out.
Notably, when I extracted just the mitochondrial data from my reads and ran the best practices on that, I was able to complete this step and get my list of variants. The file sizes of my g.vcfs range from 23G to 37G for the whole-genome data, though were only between 431K and 731K for the mitochondrial data.
I have been using GATK version 3.6 with java version 1.8. The command that I have been using is as follows:
java -jar ~/Programmes/GenomeAnalysisTK-3.8/GenomeAnalysisTK.jar -T GenotypeGVCFs -R /data/scratch/btw822/KwRp101_Broad_guided_min1000.fa -nt 8 --variant KwRp034.raw.snps.indels.g.vcf --variant KwRp037.raw.snps.indels.g.vcf --variant KwRp040.raw.snps.indels.g.vcf --variant KwRp044.raw.snps.indels.g.vcf --variant KwRp044b.raw.snps.indels.g.vcf --variant KwRp045.raw.snps.indels.g.vcf --variant KwRp050.raw.snps.indels.g.vcf --variant KwRp051.raw.snps.indels.g.vcf --variant KwRp053.raw.snps.indels.g.vcf --variant KwRp055.raw.snps.indels.g.vcf --variant KwRp097.raw.snps.indels.g.vcf --variant KwRp101.raw.snps.indels.g.vcf -o /data/omicsScratch/btw822/Rphil_Trimmed/Rphil_Pilot.vcf