Hi, GATK team,
I am testing BaseRecalibrator in GATK 4.5 beta, when running in LOCAL mode, it finishes pretty fast. However when i run BaseRecalibratorSpark in SPARK mode, it runs for a long time and eventually fails with memory errors like:
'java.lang.OutOfMemoryError:GC overhead limit exceeded'
When I look at the stdout of the executors, it contains many messages like this:
14:17:19.753 INFO KnownSitesCache - Number of variants read: 37000001
I tested HaplotypeCallerSpark on the same SPARK cluster and it can finish pretty quick too.