Hello- I'm trying to run MarkDuplicates in Picard Tools v2.10.9. Java version is 1.8.0_144. The data is a .bam file that has been sorted and indexed in samtools. (exome capture sequences from HiSeq 4000).
I used the following command line call to try the run:
java -Xmx5g -jar /Users/paa9/Desktop/picard.jar MarkDuplicates I=CAALB_20170711_K00134_IL100090633_EC-12_L004_R1.fastq.gz.srt.bam O=CAALB_20170711_K00134_IL100090633_EC-12_L004_R1.fastq.gz.srt.rmDP.bam METRICS_FILE=CAALB_20170711_K00134_IL100090633_EC-12_L004_R1.fastq.gz.srt.rmDP.mtrc REMOVE_DUPLICATES=true
At first it looks like it is running successfully, but then I get a series of error messages:
INFO 2017-08-14 16:03:59 MarkDuplicates Start of doWork freeMemory: 1015051672; totalMemory: 1029177344; maxMemory: 4772593664
INFO 2017-08-14 16:03:59 MarkDuplicates Reading input file and constructing read end information.
INFO 2017-08-14 16:03:59 MarkDuplicates Will retain up to 17292006 data points before spilling to disk.
INFO 2017-08-14 16:04:08 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:07s. Time for last 1,000,000: 7s. Last read position: JXUM01S000636:264,164
INFO 2017-08-14 16:04:08 MarkDuplicates Tracking 30491 as yet unmatched pairs. 4430 records in RAM.
INFO 2017-08-14 16:04:21 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:00:20s. Time for last 1,000,000: 12s. Last read position: JXUM01S001819:138,692
INFO 2017-08-14 16:04:21 MarkDuplicates Tracking 41344 as yet unmatched pairs. 5 records in RAM.
INFO 2017-08-14 16:04:36 MarkDuplicates Read 3,000,000 records. Elapsed time: 00:00:36s. Time for last 1,000,000: 15s. Last read position: JXUM01S004018:43,724
INFO 2017-08-14 16:04:36 MarkDuplicates Tracking 48378 as yet unmatched pairs. 9 records in RAM.
Them jere is the error message I get…
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: /var/folders/t4/1tm2l_xd5r9c9lwhrjsfzwb80000gn/T/CSPI.4085908425341095212.tmp/4388.tmpnot found
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:64)
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:102)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:518)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:228)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:96)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:106)
Caused by: java.io.FileNotFoundException: /var/folders/t4/1tm2l_xd5r9c9lwhrjsfzwb80000gn/T/CSPI.4085908425341095212.tmp/4388.tmp (Too many open files in system)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:61)
... 11 more
To check and see if it is something wrong with the input file, I ran the “ValidateSamFile” in Picard Tools using the following call:
java -jar /Users/paa9/Desktop/picard.jar ValidateSamFile I=CAALB_20170711_K00134_IL100090633_EC-12_L004_R1.fastq.gz.srt.bam MODE=SUMMARY
And here is the terminal output I get:
HISTOGRAM java.lang.String
Error Type Count
ERROR:MATE_NOT_FOUND 56371
ERROR:MISSING_READ_GROUP 1
WARNING:RECORD_MISSING_READ_GROUP 4475203
Thank you for any suggestions!!!