Hello.
I am using GATK version 3.6, picard-2.8.2.jar
I downloaded hapmap_3.3.hg38.vcf from gatk resource bundle. I then used the below command to remove chr notation.
awk '{gsub(/^chr/,""); print}' hapmap_3.3.hg38.vcf > no_chr_hapmap_3.3.hg38.vcf.vcf
Before (hapmap_3.3.hg38.vcf)
chr1 2242065 rs263526 T C . PASS AC=724;AF=0.259;AN=2792
chr1 2242417 rs16824926 C . . PASS AN=530
chr1 2242880 rs11581436 A . . PASS AN=540
After (no_chr_hapmap_3.3.hg38.vcf.vcf)
1 6421563 rs4908891 G A . PASS AC=1086;AF=0.389;AN=2792
1 6421782 rs4908892 A G . PASS AC=1692;AF=0.606;AN=2792
1 6421856 rs12078257 T C . PASS AC=368;AF=0.132;AN=2790
Then, use Picard SortVcf to sort the no_chr_hapmap_3.3.hg38.vcf.vcf
java -jar picard-2.8.2.jar SortVcf I=removedChr_HapMap.vcf O=sortedHapMap.vcf SEQUENCE_DICTIONARY=hg38.dict
hg38.dict
@SQ SN:1 LN:248956422 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:2648ae1bacce4ec4b6cf337dcae37816
@SQ SN:10 LN:133797422 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:907112d17fcb73bcab1ed1c72b97ce68
@SQ SN:11 LN:135086622 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:1511375dc2dd1b633af8cf439ae90cec
@SQ SN:12 LN:133275309 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:e81e16d3f44337034695a29b97708fce
I have then encountered this error:
Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=20) was found when SAMSequenceRecord(name=1,length=248956422,dict_index=0,assembly=null) was expected.
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:126)
at picard.vcf.SortVcf.doWork(SortVcf.java:95)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=20) was found when SAMSequenceRecord(name=1,length=248956422,dict_index=0,assembly=null) was expected.
at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:170)
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:124)
... 4 more
I have tried a lot of times but still getting back the same error. Kindly do advise how can I solve this problem.
I would then like to perform SelectVariants to extract variants that missed in HapMap but present in my dataset.
Thank you so much in advance.
Cheers,
Moon