Hello,
I am using GATK 4.0 to run GenotypeGVCFs on a cohort of 50 samples. After a number of runs which alternately overran assigned cpus or memory (up to ncpus=20, --java-options "-Xmx10g" ), a system administrator of our computing grid suggested that I include
-XX:+UseSerialGC -XX:-BackgroundCompilation
in the java options.
Now GenotypeGVCFs runs happily on the one cpu that I assign to the job, but it still terminates with memory overruns.
The last job had the following full command line
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx20g -XX:+UseSerialGC -XX:-BackgroundCompilation -jar /.../gatk/4.0.1.0/gatk-package-4.0.1.0-local.jar GenotypeGVCFs -R genome.fna -V cohort.g.vcf.gz --heterozygosity 0.00144 --heterozygosity-stdev 0.0273 --indel-heterozygosity 2.1E-4 -O all_samples_gatk4_test.vcf.gz
The process ran at the memory limit for most of the time and terminated after 19:23 hrs with this message:
Runtime.totalMemory()=20759052288
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
The output file contained only the header lines.
I could, of course, increase memory even more. I am puzzled though, because I previously analysed the same dataset with GATK 3.7 and GenotypeGVCFs ran successfully with 8gb of memory.
Is there any way I can force GenotypeGVCFs to complete the job with a 'reasonable' amount of memory?
Kind regards,
Beate