On a Linux cluster, I ran this command on a node (no job scheduler):
./gatk SplitNCigarReads -R /bigdisk/databases/genomes/human/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa -I 28_tumor.dedupped.bam -O 28_tumor.split.bam
I get this error during SplitN's second pass:
13:27:28.945 INFO ProgressMeter - 4:74283282 163.1 288256000 1767147.3`
13:27:38.955 INFO ProgressMeter - 4:74283830 163.3 288523000 1766976.9`
13:27:46.176 INFO SplitNCigarReads - Shutting down engine`
[February 6, 2018 1:27:46 PM CET] org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads done. Elapsed time: 163.42 minutes.`
Runtime.totalMemory()=12006719488`
htsjdk.samtools.util.RuntimeIOException: Attempt to add record to closed writer.
at htsjdk.samtools.util.AbstractAsyncWriter.write(AbstractAsyncWriter.java:57)
at htsjdk.samtools.AsyncSAMFileWriter.addAlignment(AsyncSAMFileWriter.java:53)
at org.broadinstitute.hellbender.utils.read.SAMFileGATKReadWriter.addRead(SAMFileGATKReadWriter.java:21)
at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.writeReads(OverhangFixingManager.java:349)
at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.flush(OverhangFixingManager.java:329)
at org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads.closeTool(SplitNCigarReads.java:195)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:897)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:152)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
at org.broadinstitute.hellbender.Main.main(Main.java:275)
The output file 28_tumor.split.bam is 0 bytes, and ther is an index file, 0 bytes also.
Java version: 1.8.0_162
GATK version: 4.0.0.0
OS: CentOS release 6.8
I ran this command on a different computer with Ubuntu 16.04 and had no problems. On different BAM files I get the same error. Any ideas? It's frustrating that I can't get GATK to run efficentlyon the cluster, only on slow computers or with computers with limited disk space. It took a month to run on about 45 pairs of RNA-Seq samples (of course I made errors during the time), so I really need it to run on the cluster.
Thanks,
Zsuzsa