I'm using GATK 3.7 and Picard v2.9.2 and when passing multiple input BAMs to MarkDuplicates
(my data is multiplexed), I get an error when trying to validate the resulting BAM file using ValidateSamFile
. I've included my MarkDuplicates
and the ValidateSamFile
command and their output.
Note that at the moment I am temporarily using Java OpenJDK v1.8. If it is a possibility that this is causing the error, I'll just have to wait until I can try it with Java Oracle.
I used the methods described in Tutorial#6483 to map and clean up the reads.
The MarkDuplicates
command:
java -jar $PICARD MarkDuplicates \
INPUT=318616_S1_L001_sorted.bam \
INPUT=318616_S1_L002_sorted.bam \
OUTPUT=318616_S1_dedup.bam \
METRICS_FILE=318616_S1_dedup_metrics.txt
Gives the output:
[Tue May 16 12:46:32 WEST 2017] picard.sam.markduplicates.MarkDuplicates INPUT=[318616_S1_L001_sorted.bam, 318616_S1_L002_sorted.bam] OUTPUT=318616_S1_dedup.bam METRICS_FILE=318616_S1_dedup_metrics.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue May 16 12:46:32 WEST 2017] Executing as olavur@hnpv-fargenCompute01 on Linux 4.4.0-72-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13; Picard version: 2.9.2-SNAPSHOT
INFO 2017-05-16 12:46:32 MarkDuplicates Start of doWork freeMemory: 247002616; totalMemory: 253231104; maxMemory: 3736076288
INFO 2017-05-16 12:46:32 MarkDuplicates Reading input file and constructing read end information.
INFO 2017-05-16 12:46:32 MarkDuplicates Will retain up to 13536508 data points before spilling to disk.
INFO 2017-05-16 12:46:40 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:07s. Time for last 1,000,000: 7s. Last read position: 3:46,939,289
INFO 2017-05-16 12:46:40 MarkDuplicates Tracking 34300 as yet unmatched pairs. 1970 records in RAM.
INFO 2017-05-16 12:46:46 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:00:13s. Time for last 1,000,000: 6s. Last read position: 6:167,786,684
INFO 2017-05-16 12:46:46 MarkDuplicates Tracking 52092 as yet unmatched pairs. 130 records in RAM.
INFO 2017-05-16 12:46:52 MarkDuplicates Read 3,000,000 records. Elapsed time: 00:00:19s. Time for last 1,000,000: 6s. Last read position: 11:55,321,871
INFO 2017-05-16 12:46:52 MarkDuplicates Tracking 53094 as yet unmatched pairs. 3924 records in RAM.
INFO 2017-05-16 12:46:57 MarkDuplicates Read 4,000,000 records. Elapsed time: 00:00:25s. Time for last 1,000,000: 5s. Last read position: 16:22,358,872
INFO 2017-05-16 12:46:57 MarkDuplicates Tracking 39568 as yet unmatched pairs. 4046 records in RAM.
INFO 2017-05-16 12:47:04 MarkDuplicates Read 5,000,000 records. Elapsed time: 00:00:31s. Time for last 1,000,000: 6s. Last read position: 22:50,518,158
INFO 2017-05-16 12:47:04 MarkDuplicates Tracking 14634 as yet unmatched pairs. 142 records in RAM.
INFO 2017-05-16 12:47:05 MarkDuplicates Read 5205808 records. 0 pairs never matched.
INFO 2017-05-16 12:47:06 MarkDuplicates After buildSortedReadEndLists freeMemory: 1438835464; totalMemory: 2132279296; maxMemory: 3736076288
INFO 2017-05-16 12:47:06 MarkDuplicates Will retain up to 116752384 duplicate indices before spilling to disk.
INFO 2017-05-16 12:47:06 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2017-05-16 12:47:07 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2017-05-16 12:47:07 MarkDuplicates Sorting list of duplicate records.
INFO 2017-05-16 12:47:08 MarkDuplicates After generateDuplicateIndexes freeMemory: 2103791192; totalMemory: 3064463360; maxMemory: 3736076288
INFO 2017-05-16 12:47:08 MarkDuplicates Marking 2637489 records as duplicates.
INFO 2017-05-16 12:47:08 MarkDuplicates Found 13624 optical duplicate clusters.
INFO 2017-05-16 12:47:08 MarkDuplicates Reads are assumed to be ordered by: coordinate
INFO 2017-05-16 12:48:24 MarkDuplicates Before output close freeMemory: 3037617104; totalMemory: 3065511936; maxMemory: 3736076288
INFO 2017-05-16 12:48:24 MarkDuplicates After output close freeMemory: 2980877992; totalMemory: 3008364544; maxMemory: 3736076288
[Tue May 16 12:48:24 WEST 2017] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 1.87 minutes.
Runtime.totalMemory()=3008364544
And the ValidateSamFile
command:
$ java -jar $PICARD ValidateSamFile I=318616_S1_dedup.bam MODE=SUMMARY
[Tue May 16 13:17:11 WEST 2017] picard.sam.ValidateSamFile INPUT=318616_S1_dedup.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue May 16 13:17:11 WEST 2017] Executing as olavur@hnpv-fargenCompute01 on Linux 4.4.0-72-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13; Picard version: 2.9.2-SNAPSHOT
[Tue May 16 13:17:16 WEST 2017] picard.sam.ValidateSamFile done. Elapsed time: 0.08 minutes.
Runtime.totalMemory()=1243611136
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 1: NS500347:4:H2CKVAFXX:1:21304:16813:12821
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
at htsjdk.samtools.SamFileValidator$CoordinateSortedPairEndInfoMap.remove(SamFileValidator.java:765)
at htsjdk.samtools.SamFileValidator.validateMateFields(SamFileValidator.java:499)
at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:297)
at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:215)
at htsjdk.samtools.SamFileValidator.validateSamFileSummary(SamFileValidator.java:143)
at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:196)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)