Quantcast
Viewing all articles
Browse latest Browse all 12345

What causes BaseRecalibratorSpark to run for a long time and end up failing with memory errors?

Hi, GATK team,

I am testing BaseRecalibrator in GATK 4.5 beta, when running in LOCAL mode, it finishes pretty fast. However when i run BaseRecalibratorSpark in SPARK mode, it runs for a long time and eventually fails with memory errors like:

'java.lang.OutOfMemoryError:GC overhead limit exceeded'

When I look at the stdout of the executors, it contains many messages like this:

14:17:19.753 INFO KnownSitesCache - Number of variants read: 37000001

I tested HaplotypeCallerSpark on the same SPARK cluster and it can finish pretty quick too.


Viewing all articles
Browse latest Browse all 12345

Trending Articles