Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

ReorderSam - inconsistencies between BAM and reference FASTA

$
0
0

Hi,

I have a bam file aligned to the genome assembly 38 but I don't have access to the reference fasta used.

I was runnig the HaplotypeCaller using another reference fasta and I got some errors indicating that some contigs in the bam file don't exist in the reference fasta.

Following this post (http://gatkforums.broadinstitute.org/gatk/discussion/1328/script-for-sorting-an-input-file-based-on-a-reference-sortbyref-pl), I tried the ReorderSam on the bam file.

When I run the ReorderSam it throws an exception "New reference sequence does not contain a matching contig for chr20_GL383577v2_alt" (see below). Although the ReorderSam documentations says: "Reads mapped to contigs absent in the new reference are dropped.", from the exception I got it seems the missing contigs are not being dropped.

time java -jar $PICARD ReorderSam \
I=$BWA_DIR'/'${SAMPLE_FILECODE}'.hg38.alignment.bam' \
O=$BWA_DIR'/'${SAMPLE_FILECODE}'.hg38.alignment.reorder.bam' \

INFO    2016-09-21 11:23:57     ReorderSam        Reordering read contig chrUn_GL000218v1 [index=354] to => ref contig chrUn_GL000218v1 [index=193]
[Wed Sep 21 11:23:57 UTC 2016] picard.sam.ReorderSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=237502464
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: New reference sequence does not contain a matching contig for chr20_GL383577v2_alt
        at picard.sam.ReorderSam.buildSequenceDictionaryMap(ReorderSam.java:229)
        at picard.sam.ReorderSam.doWork(ReorderSam.java:112)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

QUESTION 1: What is the default behaviour of ReorderSam? Does it require the argument ALLOW_INCOMPLETE_DICT_CONCORDANCE=true in order to have the "Reads mapped to contigs absent in the new reference dropped". Should I be aware of some potential downstream analysis issues after using ALLOW_INCOMPLETE_DICT_CONCORDANCE=true?

So I run ReorderSam using ALLOW_INCOMPLETE_DICT_CONCORDANCE=true.
It did process something because I have an output bam file (which I suppose is not complete) but I still got an error "Invalid reference index -1" (see below).
I checked the reference fasta-related files and there is a chrEBV (last line in dict and fai files) which does not exist in the bam file.

INFO    2016-09-21 11:00:31     ReorderSam      Wrote 2402 reads
INFO    2016-09-21 11:00:31     ReorderSam        Processing chrUn_GL000218v1
INFO    2016-09-21 11:00:31     ReorderSam      Wrote 4922 reads
[Wed Sep 21 11:00:31 UTC 2016] picard.sam.ReorderSam done. Elapsed time: 28.23 minutes.
Runtime.totalMemory()=198705152
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Invalid reference index -1
        at htsjdk.samtools.QueryInterval.<init>(QueryInterval.java:24)
        at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:504)
        at picard.sam.ReorderSam.doWork(ReorderSam.java:124)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

QUESTION 2: The fact that chrEBV does not exist in the bam file could be the cause of the exception above? Is there any tool that can deal with this or the easiest solution is to remove the chrEBV from the reference genome fasta?

Any help would be appreciated.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>