Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Error in ValidateSamFile when multiplexing MarkDuplicates

$
0
0

I'm using GATK 3.7 and Picard v2.9.2 and when passing multiple input BAMs to MarkDuplicates (my data is multiplexed), I get an error when trying to validate the resulting BAM file using ValidateSamFile. I've included my MarkDuplicates and the ValidateSamFile command and their output.

Note that at the moment I am temporarily using Java OpenJDK v1.8. If it is a possibility that this is causing the error, I'll just have to wait until I can try it with Java Oracle.

I used the methods described in Tutorial#6483 to map and clean up the reads.

The MarkDuplicates command:

java -jar $PICARD MarkDuplicates \
    INPUT=318616_S1_L001_sorted.bam \
    INPUT=318616_S1_L002_sorted.bam \
    OUTPUT=318616_S1_dedup.bam \
    METRICS_FILE=318616_S1_dedup_metrics.txt

Gives the output:

[Tue May 16 12:46:32 WEST 2017] picard.sam.markduplicates.MarkDuplicates INPUT=[318616_S1_L001_sorted.bam, 318616_S1_L002_sorted.bam] OUTPUT=318616_S1_dedup.bam METRICS_FILE=318616_S1_dedup_metrics.txt    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue May 16 12:46:32 WEST 2017] Executing as olavur@hnpv-fargenCompute01 on Linux 4.4.0-72-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13; Picard version: 2.9.2-SNAPSHOT
INFO    2017-05-16 12:46:32     MarkDuplicates  Start of doWork freeMemory: 247002616; totalMemory: 253231104; maxMemory: 3736076288
INFO    2017-05-16 12:46:32     MarkDuplicates  Reading input file and constructing read end information.
INFO    2017-05-16 12:46:32     MarkDuplicates  Will retain up to 13536508 data points before spilling to disk.
INFO    2017-05-16 12:46:40     MarkDuplicates  Read     1,000,000 records.  Elapsed time: 00:00:07s.  Time for last 1,000,000:    7s.  Last read position: 3:46,939,289
INFO    2017-05-16 12:46:40     MarkDuplicates  Tracking 34300 as yet unmatched pairs. 1970 records in RAM.
INFO    2017-05-16 12:46:46     MarkDuplicates  Read     2,000,000 records.  Elapsed time: 00:00:13s.  Time for last 1,000,000:    6s.  Last read position: 6:167,786,684
INFO    2017-05-16 12:46:46     MarkDuplicates  Tracking 52092 as yet unmatched pairs. 130 records in RAM.
INFO    2017-05-16 12:46:52     MarkDuplicates  Read     3,000,000 records.  Elapsed time: 00:00:19s.  Time for last 1,000,000:    6s.  Last read position: 11:55,321,871
INFO    2017-05-16 12:46:52     MarkDuplicates  Tracking 53094 as yet unmatched pairs. 3924 records in RAM.
INFO    2017-05-16 12:46:57     MarkDuplicates  Read     4,000,000 records.  Elapsed time: 00:00:25s.  Time for last 1,000,000:    5s.  Last read position: 16:22,358,872
INFO    2017-05-16 12:46:57     MarkDuplicates  Tracking 39568 as yet unmatched pairs. 4046 records in RAM.
INFO    2017-05-16 12:47:04     MarkDuplicates  Read     5,000,000 records.  Elapsed time: 00:00:31s.  Time for last 1,000,000:    6s.  Last read position: 22:50,518,158
INFO    2017-05-16 12:47:04     MarkDuplicates  Tracking 14634 as yet unmatched pairs. 142 records in RAM.
INFO    2017-05-16 12:47:05     MarkDuplicates  Read 5205808 records. 0 pairs never matched.
INFO    2017-05-16 12:47:06     MarkDuplicates  After buildSortedReadEndLists freeMemory: 1438835464; totalMemory: 2132279296; maxMemory: 3736076288
INFO    2017-05-16 12:47:06     MarkDuplicates  Will retain up to 116752384 duplicate indices before spilling to disk.
INFO    2017-05-16 12:47:06     MarkDuplicates  Traversing read pair information and detecting duplicates.
INFO    2017-05-16 12:47:07     MarkDuplicates  Traversing fragment information and detecting duplicates.
INFO    2017-05-16 12:47:07     MarkDuplicates  Sorting list of duplicate records.
INFO    2017-05-16 12:47:08     MarkDuplicates  After generateDuplicateIndexes freeMemory: 2103791192; totalMemory: 3064463360; maxMemory: 3736076288
INFO    2017-05-16 12:47:08     MarkDuplicates  Marking 2637489 records as duplicates.
INFO    2017-05-16 12:47:08     MarkDuplicates  Found 13624 optical duplicate clusters.
INFO    2017-05-16 12:47:08     MarkDuplicates  Reads are assumed to be ordered by: coordinate
INFO    2017-05-16 12:48:24     MarkDuplicates  Before output close freeMemory: 3037617104; totalMemory: 3065511936; maxMemory: 3736076288
INFO    2017-05-16 12:48:24     MarkDuplicates  After output close freeMemory: 2980877992; totalMemory: 3008364544; maxMemory: 3736076288
[Tue May 16 12:48:24 WEST 2017] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 1.87 minutes.
Runtime.totalMemory()=3008364544

And the ValidateSamFile command:

$ java -jar $PICARD ValidateSamFile I=318616_S1_dedup.bam MODE=SUMMARY
[Tue May 16 13:17:11 WEST 2017] picard.sam.ValidateSamFile INPUT=318616_S1_dedup.bam MODE=SUMMARY    MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue May 16 13:17:11 WEST 2017] Executing as olavur@hnpv-fargenCompute01 on Linux 4.4.0-72-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13; Picard version: 2.9.2-SNAPSHOT
[Tue May 16 13:17:16 WEST 2017] picard.sam.ValidateSamFile done. Elapsed time: 0.08 minutes.
Runtime.totalMemory()=1243611136
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once.  1: NS500347:4:H2CKVAFXX:1:21304:16813:12821
        at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
        at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
        at htsjdk.samtools.SamFileValidator$CoordinateSortedPairEndInfoMap.remove(SamFileValidator.java:765)
        at htsjdk.samtools.SamFileValidator.validateMateFields(SamFileValidator.java:499)
        at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:297)
        at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:215)
        at htsjdk.samtools.SamFileValidator.validateSamFileSummary(SamFileValidator.java:143)
        at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:196)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)

Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>