Hi,
I have a bam file aligned to the genome assembly 38 but I don't have access to the reference fasta used.
I was runnig the HaplotypeCaller using another reference fasta and I got some errors indicating that some contigs in the bam file don't exist in the reference fasta.
Following this post (http://gatkforums.broadinstitute.org/gatk/discussion/1328/script-for-sorting-an-input-file-based-on-a-reference-sortbyref-pl), I tried the ReorderSam on the bam file.
When I run the ReorderSam it throws an exception "New reference sequence does not contain a matching contig for chr20_GL383577v2_alt" (see below). Although the ReorderSam documentations says: "Reads mapped to contigs absent in the new reference are dropped.", from the exception I got it seems the missing contigs are not being dropped.
time java -jar $PICARD ReorderSam \
I=$BWA_DIR'/'${SAMPLE_FILECODE}'.hg38.alignment.bam' \
O=$BWA_DIR'/'${SAMPLE_FILECODE}'.hg38.alignment.reorder.bam' \
INFO 2016-09-21 11:23:57 ReorderSam Reordering read contig chrUn_GL000218v1 [index=354] to => ref contig chrUn_GL000218v1 [index=193]
[Wed Sep 21 11:23:57 UTC 2016] picard.sam.ReorderSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=237502464
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: New reference sequence does not contain a matching contig for chr20_GL383577v2_alt
at picard.sam.ReorderSam.buildSequenceDictionaryMap(ReorderSam.java:229)
at picard.sam.ReorderSam.doWork(ReorderSam.java:112)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
QUESTION 1: What is the default behaviour of ReorderSam? Does it require the argument ALLOW_INCOMPLETE_DICT_CONCORDANCE=true in order to have the "Reads mapped to contigs absent in the new reference dropped". Should I be aware of some potential downstream analysis issues after using ALLOW_INCOMPLETE_DICT_CONCORDANCE=true?
So I run ReorderSam using ALLOW_INCOMPLETE_DICT_CONCORDANCE=true.
It did process something because I have an output bam file (which I suppose is not complete) but I still got an error "Invalid reference index -1" (see below).
I checked the reference fasta-related files and there is a chrEBV (last line in dict and fai files) which does not exist in the bam file.
INFO 2016-09-21 11:00:31 ReorderSam Wrote 2402 reads
INFO 2016-09-21 11:00:31 ReorderSam Processing chrUn_GL000218v1
INFO 2016-09-21 11:00:31 ReorderSam Wrote 4922 reads
[Wed Sep 21 11:00:31 UTC 2016] picard.sam.ReorderSam done. Elapsed time: 28.23 minutes.
Runtime.totalMemory()=198705152
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Invalid reference index -1
at htsjdk.samtools.QueryInterval.<init>(QueryInterval.java:24)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:504)
at picard.sam.ReorderSam.doWork(ReorderSam.java:124)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
QUESTION 2: The fact that chrEBV does not exist in the bam file could be the cause of the exception above? Is there any tool that can deal with this or the easiest solution is to remove the chrEBV from the reference genome fasta?
Any help would be appreciated.