Dear GATK_team, I'd like to run Spark-enabled GATK tools on a Spark cluster. Precisely I am launching a Spark cluster in the standalone mode submitting the BaseRecalibratorSpark
application via Slurm. Before the official release, I was running the gatk-4.beta.6-17
version, with the following allocated resources, and the following command line for the Spark arguments: ./gatk-launch BaseRecalibratorSpark \ --sparkRunner SPARK --sparkMaster spark://${MASTER} --driver-memory 80g --num-executors 16 --executor-memory 8g
. The speed-up achieved was 3.79 min. However, with the official release GATK-4.0.0.0, with the same datafiles and the same Spark arguments I don't see the same nice speed-up anymore (~ 40 min). Am I missing something with the new version? Or with the invoking command line? Thanks in advance for your time and kind answer. Best, Giuseppe
↧
GATK - 4.0.0.0 [BaseRecalibratorSpark low performance]
↧