Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

BaseRecalibrator: Lexicographically sorted human genome sequence detected in knownSites

$
0
0

Hello,

I've tried everything but still get an error: when I run:

java -jar /data/GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19.fasta -I reordered.bam -knownSites hg19.dbsnp.sorted.vcf -o recalibration_report.grp

ERROR MESSAGE: Lexicographically sorted human genome sequence detected in knownSites. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1328for more information. Error details: knownSites contigs = [chr1, chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr1_gl000191_random, chr1_gl000192_random, chr2, chr20, chr21, chr21_gl000210_random, chr22, chr3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]

ERROR ------------------------------------------------------------------------------------------

I made the bam from a fastq and used ucsc.hg19.fasta as the reference. Made the dictionary file, sorted and indexed bam, ran MarkDuplicates and AddOrReplaceReadGroups. Next, I used RealignerTargetCreator followed by the IndelRealigner. This all worked without errors.

I downloaded the latest version of dbSNP150
ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/00-All.vcf.gz
and followed the these steps to prepare the file:
2. gunzip 00-All.vcf.gz

  1. awk '/^#/ {print $0}' 00-All.vcf > head.txt

  2. sed -i 's/chrMT/chrM/g' head.txt

  3. awk '/^#/ {next}{print $0}' 00-All.vcf | sed 's/^/chr/' > 1.vcf

  4. sed -i 's/chrMT/chrM/g' 1.vcf completed step

  5. cat head.txt 1.vcf > hg19.dbsnp.vcf

  6. IGVTools/igvtools index hg19.dbsnp.vcf

  7. awk '/^#/ {next}{print $1}' hg19.dbsnp.vcf | sort |uniq

Next I ran BaseRecalibrator:

java -jar /data/GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19.fasta -I initial.bam -knownSites hg19.dbsnp.vcf -o recalibration_report.grp

When I got an error message about cotig's not being ordered the same I ran:
picard ReorderSam on the initial.bam file and SortVcf on the hg19.dbsnp.vcf.

After I ran BaseRecalibrator again:

java -jar /data/GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19.fasta -I reordered.bam -knownSites hg19.dbsnp.sorted.vcf -o recalibration_report.grp

** Lexicographically sorted human genome sequence detected in knownSites**.

I'm not sure what the problem is? Could someone please suggest a fix?

Thanks,

Lena


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>