Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Picard ValidateSamFile failing with INVALID_TAG_NM on hg38 HLA contigs

$
0
0

Picard ValidateSamFile is failing with INVALID_TAG_NM on hg38 HLA contigs when MODE=VERBOSE. The first 100 HLA reads in my BAM file failed. I assume all would fail as there were a number of different contigs among the first 100. When I validated the same BAM file with IGNORE=INVALID_TAG_NM, it passes.

Oddly, when MODE=SUMMARY, I got 'No errors found'.

I am running Picard 2.9.2 and using the GATk bundle Homo_sapiens_assembly38.fasta* as the reference. The BAM file was produced by Novoalign and processed by SortSam and SetNmMdAndUqTags. I also tried the deprecated SetNmAndUqTags. Moreover, I manually checked a few of the failing reads in the original BAM file produced by Novoalign and the records, including the NM tags, were the same as after running SetNmMdAndUqTags. I looked at the subset of the failed reads where NM was 0 and compared them directly to the sequence in the fasta file; all were a perfect match at the expected position.

*This shouldn't matter, but I replaced the tab character with two spaces to separate the contig name field in the '>' lines of of the HLA records of Homo_sapiens_assembly38.fasta. All the other records have two spaces, and the tab character was causing problems for Novoalign (to be fixed in the next release).

Picard ValidateSamFile command/stderr:

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/run/media/yoursham/MY_6Tb_1/germline/cromwell-executions/PairedEndSingleSampleWorkflow/d626fa7a-c423-4f0c-98b2-aa0274b658e3/call-ValidateReadGroupSamFile/shard-0/execution/tmp.Et4z6V
[Tue Jun 06 00:40:48 UTC 2017] picard.sam.ValidateSamFile INPUT=/run/media/yoursham/MY_6Tb_1/germline/cromwell-executions/PairedEndSingleSampleWorkflow/d626fa7a-c423-4f0c-98b2-aa0274b658e3/call-ValidateReadGroupSamFile/shard-0/inputs/run/media/yoursham/MY_6Tb_1/germline/cromwell-executions/PairedEndSingleSampleWorkflow/d626fa7a-c423-4f0c-98b2-aa0274b658e3/call-SortAndFixReadGroupBam/shard-0/execution/NIST7035_TAAGGCGA_L001.aligned.sorted.bam OUTPUT=NIST7035_TAAGGCGA_L001.validation_report MODE=VERBOSE IGNORE=[] MAX_OUTPUT=1000000000 IS_BISULFITE_SEQUENCED=false REFERENCE_SEQUENCE=/mnt/hdd/germline/resources/gatk_bundle/Homo_sapiens_assembly38.fasta    IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue Jun 06 00:40:48 UTC 2017] Executing as yoursham@yoursham-linux on Linux 3.10.0-514.16.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_131-b12; Picard version: 2.9.2-SNAPSHOT
INFO    2017-06-06 00:41:45 SamFileValidator    Validated Read    10,000,000 records.  Elapsed time: 00:00:56s.  Time for last 10,000,000:   55s.  Last read position: chr5:75,080,505
INFO    2017-06-06 00:42:37 SamFileValidator    Validated Read    20,000,000 records.  Elapsed time: 00:01:48s.  Time for last 10,000,000:   52s.  Last read position: chr11:65,355,941
INFO    2017-06-06 00:43:31 SamFileValidator    Validated Read    30,000,000 records.  Elapsed time: 00:02:41s.  Time for last 10,000,000:   53s.  Last read position: chr19:1,918,167
INFO    2017-06-06 00:44:24 SamFileValidator    Validated Read    40,000,000 records.  Elapsed time: 00:03:34s.  Time for last 10,000,000:   52s.  Last read position: */*
[Tue Jun 06 00:45:14 UTC 2017] picard.sam.ValidateSamFile done. Elapsed time: 4.42 minutes.
Runtime.totalMemory()=1348468736
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

Typical errors:

ERROR: Record 38230964, Read name HWI-D00119:50:H7AP8ADXX:1:2203:14796:98239, NM tag (nucleotide differences) in file [0] does not match reality [75]
ERROR: Record 38230965, Read name HWI-D00119:50:H7AP8ADXX:1:1209:5003:56756, NM tag (nucleotide differences) in file [1] does not match reality [76]
ERROR: Record 38230966, Read name HWI-D00119:50:H7AP8ADXX:1:1108:3556:87908, NM tag (nucleotide differences) in file [1] does not match reality [72]
ERROR: Record 38230967, Read name HWI-D00119:50:H7AP8ADXX:1:1208:5673:42488, NM tag (nucleotide differences) in file [0] does not match reality [72]
ERROR: Record 38230968, Read name HWI-D00119:50:H7AP8ADXX:1:1211:16359:18440, NM tag (nucleotide differences) in file [0] does not match reality [72]

BAM records for the above errors:

HWI-D00119:50:H7AP8ADXX:1:2203:14796:98239  99  HLA-A*11:50Q    1091    30  101M    =   1111    121 CGCCTACGACGGCAAGGATTACATCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCGGCGGACATGGCAGCTCAGATCACCAAGCGCAAGTGGGAGGCGG   @?>?=>?@>@@@@==@@>>>>@>>@@@@@>@??@@>@@>@@>@@@@>@@>@@>@@@@@?@@@=@>=AA@>@@>@>@>>?>@?>>?@@@>>@>@@@>@?>??   ZA:f:30 LB:Z:NIST7035_Nextera-Rapid-Capture-Exome-and-Expanded-Exome    MD:Z:101    PG:Z:novoalign  RG:Z:H7AP8ADXX_TAAGGCGA_1_NA12878   AM:i:2  NM:i:0  SM:i:2  PQ:i:1  UQ:i:0  AS:i:0  PU:Z:H7AP8ADXX_TAAGGCGA_1_NA12878
HWI-D00119:50:H7AP8ADXX:1:1209:5003:56756   163 HLA-A*11:50Q    1092    30  101M    =   1286    295 GCCTACGACGGCAAGGATTACATCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCGGCGGACATGGCAGCTCAGATCACCAAACGCAAGTGGGAGGCGGC   ?@@>=@@8@@@?==?@<==<@9<@5@@@>@>=@@9??>@@?@@@@>@>=@@<9?@<?><??<?<>>:>.??==:>>>@=9>=>.>>?==?=@??9>>?9=@   ZA:f:30 LB:Z:NIST7035_Nextera-Rapid-Capture-Exome-and-Expanded-Exome    MD:Z:83G17  PG:Z:novoalign  RG:Z:H7AP8ADXX_TAAGGCGA_1_NA12878   AM:i:2  NM:i:1  SM:i:2  PQ:i:26 UQ:i:13 AS:i:18 PU:Z:H7AP8ADXX_TAAGGCGA_1_NA12878
HWI-D00119:50:H7AP8ADXX:1:1108:3556:87908   163 HLA-A*11:50Q    1101    30  101M    =   1162    162 GGCAAGGATTACATCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCGGCGGACATGGCAGCCCAGATCACCAAGCGCAAGTGGGAGGCGGCCCGTCGGGC   ?>@;<@@>===?==@@@@@=@=>@?=@@>@@>@>@@=@@=@A=@@@@@?@@@=9==@>@=@?+<=@<:?=@?/2<>9:/;;;@@@>=:?@@@==>;>?@@>   ZA:f:27 LB:Z:NIST7035_Nextera-Rapid-Capture-Exome-and-Expanded-Exome    MD:Z:62T38  PG:Z:novoalign  RG:Z:H7AP8ADXX_TAAGGCGA_1_NA12878   AM:i:1  NM:i:1  SM:i:3  PQ:i:13 UQ:i:10 AS:i:13 PU:Z:H7AP8ADXX_TAAGGCGA_1_NA12878
HWI-D00119:50:H7AP8ADXX:1:1208:5673:42488   163 HLA-A*11:50Q    1111    30  101M    =   1143    133 ACATCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCGGCGGACATGGCAGCTCAGATCACCAAGCGCAAGTGGGAGGCGGCCCGTCGGGCGGAGCAGCGG   ;@;>@@?@@=@==@@>@@>@@>@@@@>@@?@@?@@@@@@@@@>@>>@@@>@@=@>@>>@=@@>=@@A@=>?>?@@>@@@@@@@@@>@?@@?@@>@?>@@=>   ZA:f:27 LB:Z:NIST7035_Nextera-Rapid-Capture-Exome-and-Expanded-Exome    MD:Z:101    PG:Z:novoalign  RG:Z:H7AP8ADXX_TAAGGCGA_1_NA12878   AM:i:1  NM:i:0  SM:i:25 PQ:i:0  UQ:i:0  AS:i:0  PU:Z:H7AP8ADXX_TAAGGCGA_1_NA12878
HWI-D00119:50:H7AP8ADXX:1:1211:16359:18440  163 HLA-A*11:50Q    1111    30  101M    =   1135    125 ACATCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCGGCGGACATGGCAGCTCAGATCACCAAGCGCAAGTGGGAGGCGGCCCGTCGGGCGGAGCAGCGG   ;@;>@@?@@=@==@@>@@>@@>@@@@>@@?@@?@@@@@@@@@>@>>??@>@@=@>@>>@>@@>>@?@@>>@>@@@>@@@@@@@@@>?>@@?>?=?@>@@@@   ZA:f:27 LB:Z:NIST7035_Nextera-Rapid-Capture-Exome-and-Expanded-Exome    MD:Z:101    PG:Z:novoalign  RG:Z:H7AP8ADXX_TAAGGCGA_1_NA12878   AM:i:1  NM:i:0  SM:i:25 PQ:i:1  UQ:i:0  AS:i:0  PU:Z:H7AP8ADXX_TAAGGCGA_1_NA12878

Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>