GATK 3.8 Dictionary cannot have size zero

August 9, 2017, 6:17 am

≪ Previous: HaplotypeCaller raises an error with -A BaseCountsBySample

Hi,

today I wanted to update my pipeline from running on GATK3.6 to 3.8. But now it complains about the dictionary.

ERROR --

ERROR stack trace

java.lang.IllegalArgumentException: Dictionary cannot have size zero
at org.broadinstitute.gatk.utils.MRUCachingSAMSequenceDictionary.(MRUCachingSAMSequenceDictionary.java:62)
at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:78)
at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:75)
at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
at java.lang.ThreadLocal.get(ThreadLocal.java:170)
at org.broadinstitute.gatk.utils.GenomeLocParser.getContigInfo(GenomeLocParser.java:91)
at org.broadinstitute.gatk.utils.GenomeLocParser.getContigs(GenomeLocParser.java:204)
at org.broadinstitute.gatk.utils.GenomeLocParser.(GenomeLocParser.java:135)
at org.broadinstitute.gatk.utils.GenomeLocParser.(GenomeLocParser.java:108)
at org.broadinstitute.gatk.utils.GenomeLocSortedSet.createSetFromSequenceDictionary(GenomeLocSortedSet.java:444)
at org.broadinstitute.gatk.engine.datasources.reads.BAMScheduler.createOverMappedReads(BAMScheduler.java:66)
at org.broadinstitute.gatk.engine.datasources.reads.IntervalSharder.shardOverMappedReads(IntervalSharder.java:55)
at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource.createShardIteratorOverMappedReads(SAMDataSource.java:1237)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getShardStrategy(GenomeAnalysisEngine.java:676)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------

ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):

ERROR

ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

ERROR If not, please post the error message, with stack trace, to the GATK forum.

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://software.broadinstitute.org/gatk

ERROR

ERROR MESSAGE: Dictionary cannot have size zero

ERROR ------------------------------------------------------------------------------------------

The command I used is basically the same I used for GATK 3.6 where it still worked this yesterday.

/dsk/localall/lib/java/jre1.8.0_131_x64/bin/java \
-Xmx10g \
-jar /data/ngs/bin/GATK/3.8.0/GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-l INFO \
-nt 4 \
-R /data/ngs/resources/bundle/2.8/ucsc.hg19.fasta \
-known /data/ngs/resources/bundle/2.8/Mills_and_1000G_gold_standard.indels.hg19.vcf \
-known /data/ngs/resources/bundle/2.8/1000G_phase1.indels.hg19.vcf \
--filter_mismatching_base_and_quals \
--filter_bases_not_stored \
--filter_reads_with_N_cigar \
-I out/01-alignment/FG315.bam \
-o out/03-realignment/FG315/FG315.list \
-log out/03-realignment/FG315/FG315.creator.log

I've tried to google this issue and found an old forum entry on seqanswers.com where they state, that the index should be redone.
http://seqanswers.com/forums/showthread.php?t=20599

I've checked my index and it's not empty.
11K -rwxrws---+ 1 hoppmann ngs 3.5K Aug 1 16:58 /data/ngs/resources/bundle/2.8/ucsc.hg19.fasta.fai
chrM 16571 6 50 51
chr1 249250621 16915 50 51
chr2 243199373 254252555 50 51
chr3 198022430 502315922 50 51
chr4 191154276 704298807 50 51
chr5 180915260 899276175 50 51
chr6 171115067 1083809747 50 51
chr7 159138663 1258347122 50 51
chr8 146364022 1420668565 50 51
chr9 141213431 1569959874 50 51
chr10 135534747 1713997581 50 51
chr11 135006516 1852243030 50 51
...

I still did the indexing with samtools, but it didn't work.

Am I doing something wrong, or is this a bug?

Best,

Anselm Hoppmann

↧

GATK 3.8 logger ERROR

August 9, 2017, 6:20 am

≫ Next: SNV gets dbSNP annotation in one sample, doesn't get annotated in another one

≪ Previous: GATK 3.8 Dictionary cannot have size zero

Hi,

after updating to GATK 3.8 I found the following ERROR in my log file.

ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:f
ile:/dsk/data1/ngs/bin/GATK/3.8.0/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using Si
mpleLogger to log to the console...

I guess this is a bug. Might it be, that you forgot to add the library to the build?

Best,

Anselm Hoppmann

↧

SNV gets dbSNP annotation in one sample, doesn't get annotated in another one

August 9, 2017, 7:17 am

≫ Next: Calling copy number from WGS data using PON from different platform

≪ Previous: GATK 3.8 logger ERROR

Hello everyone,

I recently run HaplotypeCaller for GATK3.7 on a series of samples (several GATK runs performed at the same time), using the latest release of dbSNP(150). This was the command line I used for both cases (I omissed the full paths for privacy concerns):

/usr/bin/java -Djava.io.tmpdir=/scratch/javatmp/ngs_pipe \ -Xmx4g -jar /data01/Softwares/GATK/3.7/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R /path/to/hg19 \ -I input_bam \ -o output.vcf \ --dbsnp dbSNP_150_NEW_hg19_chr.vcf

and here's the same variant reported in two different files of the same data (exomes) performed using the same kit, on the same NextSeq run

sample 1:

chr11 125479363 rs2241502 G A 208.01 . AC=2;AF=1.00;AN=2;DB;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=23.11;SOR=0.892 GT:AD:DP:GQ:PL 1/1:0,9:9:27:222,27,0

sample2:

chr11 125479363 . G A 597.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.140;ClippingRankSum=0.000;DP=41;ExcessHet=3.0103;FS=2.820;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=14.58;ReadPosRankSum=-1.110;SOR=0.448 GT:AD:DP:GQ:PL 0/1:16,25:41:99:605,0,343

I've seen this error running sistematically for several other positions in the same run, and I fear that the error might be always been there and I didn't notice before. I'm wondering if you know why this occurs, if it's a bug and it is known and if I should reannotate with VariantAnnotator every vcf I got in order to fix the issue (if VariantAnnotator is immune from this bug)

Thanks a lot for your help and time, I'm here for every clarification

↧

Calling copy number from WGS data using PON from different platform

August 9, 2017, 7:27 am

≫ Next: FastaAlternateReferenceMaker for polyploid

≪ Previous: SNV gets dbSNP annotation in one sample, doesn't get annotated in another one

Hi, I'm trying to call somatic copy number form WGS data using the workflow laid out in cnv_somatic_copy_ratio_bam_workflow.wdl
However, I do not have a panel of normals for the sample. The data was sequenced using HiSeq 4000 with 100bp reads. Would it be reasonable to construct a panel of normals from publicly available data (1000 genomes project) that was sequenced using HiSeq 2000 and has 90bp reads?

↧

FastaAlternateReferenceMaker for polyploid

August 9, 2017, 9:09 am

≫ Next: Is GATK overestimating the heterozygous calls?

≪ Previous: Calling copy number from WGS data using PON from different platform

Are there any tools similar to FastaAlternateReferenceMaker that can create fasta sequences from a pooled-sequencing vcf file? The sample data came from 25 pooled diploid individuals, so I created the vcf for ploidy = 50. I know that a similar question was asked concerning multiple-sample vcfs (https://gatkforums.broadinstitute.org/gatk/discussion/1654/fastaalternatereferencemaker-for-several-individuals), but this case is different since the samples were all pooled without barcodes.
Any suggestions would be appreciated. Thanks!

↧

Is GATK overestimating the heterozygous calls?

July 24, 2017, 1:21 am

≫ Next: Annotation modules in Haplotypecaller and Genotype gVCFs

≪ Previous: FastaAlternateReferenceMaker for polyploid

Hi,
I have 24 genotypes distributed in 4 different populations.

I used HaplotypeCaller with the option –ERC –GVCF and obtained the vcf file for each genotype. Then combined all the genotypes to a single vcf file with GenotypeGVCFs option.

Is there a way to tell GATK to label a variant site as „Heterozygous“ only if it is present in >60% of the reads?

Example:
At position 82 (highlighted with a red box in the figure), the genotype field for this variant is 0/1. Whereas, as seen from the IGV, only 3 of the 10 reads contain an alternate allele „A“. Which filter should I use in the HaplotypeCaller or GenotypeGVCF or VariantFiltration to label a variant site as heterozygous if it’s present in say, for example 6 out of 10 reads.

↧

Annotation modules in Haplotypecaller and Genotype gVCFs

July 24, 2017, 1:37 am

≫ Next: Clean version of dbSNP in the GATK resource bundle

≪ Previous: Is GATK overestimating the heterozygous calls?

I am performing WGS using the GATK best practice guidelines for the '-ERC GVCF' cohort analysis workflow. If I ran HaplotypeCaller in default mode (i.e. without specifying any particular annotation modules (e.g. -A Coverage, -A FisherStrand, -A QualByDepth) to generate the gvcf files, do I have the option to add these annotation modules when I run Genotype GVCFs? Not sure whether annotation modules requested at the Genotype GVCF step need to also be present in each of the individual gvcf files?

↧

Clean version of dbSNP in the GATK resource bundle

June 30, 2017, 2:39 pm

≫ Next: can I use indel realignment bam file for extract SNPs?

≪ Previous: Annotation modules in Haplotypecaller and Genotype gVCFs

Hi I understand that version 129 of dbSNP is considered clean and does not share data from other databases such as 1000G projects.

What steps of the variant calling in WGS/WES analysis can be affected by using GATK resource bundle provided dnSNP138 after excluding the sites released after version 129 versus using all variants in the dbSNP version 138 ?
Are there any specific drawbacks of using latest dbSNP version 150 ?

↧

can I use indel realignment bam file for extract SNPs?

July 24, 2017, 5:59 am

≫ Next: VariantRecalibrator for bacterial genome annotation

≪ Previous: Clean version of dbSNP in the GATK resource bundle

Hi everybody,
I'm following the pipeline for VC in RNAseq and I have some doubts. At that moment I've done:
1)Split'N'Trim and reassign mapping qualities (output: split.bam)

2)Indel Realignment: at that point I create realignment targets ( java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I dedup.bam -o targetintervals.list) and indel realignment (java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T IndelRealigner -R ref -I dedup.bam -targetIntervals targetintervals.list -o realigned.bam)
output: realigned.bam

3)Variant calling:In this step I've run the command: java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fasta -I realigned.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf, usind as input: realigned.bam
But I've run it only once, my first doubt is in this step, should I use a different command line to generate the output.vcf for INDELS and SNPs? or I can run it only one and filter later (as follow)?

4)Variant filtering:
Extract SNPs
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V output.vcf -selectType SNP -o snps.vcf

Extract Indels
java -jar ~/bin/GATK/GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V output.vcf -selectType INDEL -o indels.vcf

in both cases I use as vcf the output generated in variant calling, is that correct, or the vcf should be different?

Thank you very much in advance,

↧

VariantRecalibrator for bacterial genome annotation

July 24, 2017, 6:33 am

≫ Next: Location of documentation for option types

≪ Previous: can I use indel realignment bam file for extract SNPs?

Hello!, i want to filter bad variants using VariantRecalibrator, the problem is am a bit lost which are the databases i can use as resource and should i have them on my local machine? My input file is vcf from HaplotypeCaller. I would really appreciate your support
Thanks in advance.

↧

Location of documentation for option types

August 9, 2017, 11:19 am

≫ Next: Clock drift error

≪ Previous: VariantRecalibrator for bacterial genome annotation

Hi,

Could you provide a pointer to the place in the documentation where types such as

RodBinding[VariantContext]
ArrayList[String]
List[Type]
Set[File]
etc

and their exact usage are described? Despite searching I seem to be unable to find the description.

Best regards,

Bernt

↧

Clock drift error

March 26, 2015, 6:23 am

≫ Next: Missing version label for the downloaded GATK release ("GenomeAnalysisTK-.tar.bz2")

≪ Previous: Location of documentation for option types

I'm trying to use genotypeGCFs on 350 small gVCF files (from bacterial genomes) and I'm getting this "clock drift" error:

INFO 13:02:46,006 ProgressMeter - chr1:44001 0.0 3.0 h 15250.3 w 2.0% 6.3 d 6.2 d
INFO 13:06:30,421 ProgressMeter - chr1:44001 0.0 3.0 h 15250.3 w 2.0% 6.4 d 6.3 d
INFO 14:38:28,521 ProgressMeter - chr1:44001 0.0 3.1 h 15250.3 w 2.0% 6.6 d 6.5 d
WARN 16:24:14,924 SimpleTimer - Clock drift of -1,425,286,806,691,525,163 - -1,425,286,778,358,860,637 = 28,332,664,526 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
WARN 18:23:08,307 SimpleTimer - Clock drift of -1,425,286,778,358,827,793 - -1,425,286,806,691,525,163 = 28,332,697,370 nanoseconds detected, vs. max allowable drift of 5,000,000,000. Assuming checkpoint/restart event.
INFO 22:55:24,502 ProgressMeter - chr1:44001 0.0 0.0 s 0.0 s 2.0% 0.0 s 0.0 s
INFO 02:46:17,060 ProgressMeter - chr1:44001 0.0 14.1 h 15250.3 w 2.0% 4.3 w 4.2 w

Any ideas what that is?

↧

Missing version label for the downloaded GATK release ("GenomeAnalysisTK-.tar.bz2")

July 24, 2017, 8:20 am

≫ Next: IllegalArgumentException: samples cannot be empty

≪ Previous: Clock drift error

Hello, I am currently developing a pipeline for genome assembly and annotation, in which GATK is one of many dependencies. Since the current version of GATK (3.7) still need manual registration and downloading, I wrote a step by step guide for the users to do so. I noticed that if the users download the GATK 3.7 from this website, the name of the downloaded file will be "GenomeAnalysisTK-.tar.bz2" instead of "GenomeAnalysisTK-3.7-0.tar.bz2", which is not ideal. I remember the version label used to be there a month ago but now it got dropped somehow. Could you please add this label back? I know this is a very minor issue but it will be good to have it corrected. Thanks in advance!

Best,
Jia-Xing

↧

IllegalArgumentException: samples cannot be empty

July 24, 2017, 10:53 am

≫ Next: Truth or control samples - Variant calling

≪ Previous: Missing version label for the downloaded GATK release ("GenomeAnalysisTK-.tar.bz2")

I am trying to run HaplotypeCaller on some data that I know is messy and would fail some of the filters, so I have run it both with and without --disableToolDefaultReadFilters. Either way I don't get any output file, but I do get a message "samples cannot be empty", Does this mean that my data is still failing some built-in control, or am I doing something else wrong? I have checked the SQ, and when I run CountReads (with --disableToolDefaultReadFilters) it results in "Tool returned: 24634".

Here's my command:

$ java -jar ~/Downloads/gatk-4.beta.2/gatk-package-4.beta.2-local.jar HaplotypeCaller -R DQA_contig.fasta -ploidy 50 -I IRL-A.bam.sorted.bam -O IRL-A.vcf --disableToolDefaultReadFilters
13:40:49.007 WARN IntelGKLUtils - Error starting process to check for AVX support : grep -i avx /proc/cpuinfo
13:40:49.014 WARN IntelGKLUtils - Error starting process to check for AVX support : grep -i avx /proc/cpuinfo
[July 24, 2017 1:40:48 PM EDT] HaplotypeCaller --sample_ploidy 50 --output IRL-A.vcf --input IRL-A.bam.sorted.bam --reference DQA_contig.fasta --disableToolDefaultReadFilters true --group StandardAnnotation --group StandardHCAnnotation --GVCFGQBands 1 --GVCFGQBands 2 --GVCFGQBands 3 --GVCFGQBands 4 --GVCFGQBands 5 --GVCFGQBands 6 --GVCFGQBands 7 --GVCFGQBands 8 --GVCFGQBands 9 --GVCFGQBands 10 --GVCFGQBands 11 --GVCFGQBands 12 --GVCFGQBands 13 --GVCFGQBands 14 --GVCFGQBands 15 --GVCFGQBands 16 --GVCFGQBands 17 --GVCFGQBands 18 --GVCFGQBands 19 --GVCFGQBands 20 --GVCFGQBands 21 --GVCFGQBands 22 --GVCFGQBands 23 --GVCFGQBands 24 --GVCFGQBands 25 --GVCFGQBands 26 --GVCFGQBands 27 --GVCFGQBands 28 --GVCFGQBands 29 --GVCFGQBands 30 --GVCFGQBands 31 --GVCFGQBands 32 --GVCFGQBands 33 --GVCFGQBands 34 --GVCFGQBands 35 --GVCFGQBands 36 --GVCFGQBands 37 --GVCFGQBands 38 --GVCFGQBands 39 --GVCFGQBands 40 --GVCFGQBands 41 --GVCFGQBands 42 --GVCFGQBands 43 --GVCFGQBands 44 --GVCFGQBands 45 --GVCFGQBands 46 --GVCFGQBands 47 --GVCFGQBands 48 --GVCFGQBands 49 --GVCFGQBands 50 --GVCFGQBands 51 --GVCFGQBands 52 --GVCFGQBands 53 --GVCFGQBands 54 --GVCFGQBands 55 --GVCFGQBands 56 --GVCFGQBands 57 --GVCFGQBands 58 --GVCFGQBands 59 --GVCFGQBands 60 --GVCFGQBands 70 --GVCFGQBands 80 --GVCFGQBands 90 --GVCFGQBands 99 --indelSizeToEliminateInRefModel 10 --useAllelesTrigger false --dontTrimActiveRegions false --maxDiscARExtension 25 --maxGGAARExtension 300 --paddingAroundIndels 150 --paddingAroundSNPs 20 --kmerSize 10 --kmerSize 25 --dontIncreaseKmerSizesForCycles false --allowNonUniqueKmersInRef false --numPruningSamples 1 --recoverDanglingHeads false --doNotRecoverDanglingBranches false --minDanglingBranchLength 4 --consensus false --maxNumHaplotypesInPopulation 128 --errorCorrectKmers false --minPruning 2 --debugGraphTransformations false --kmerLengthForReadErrorCorrection 25 --minObservationsForKmerToBeSolid 20 --likelihoodCalculationEngine PairHMM --base_quality_score_threshold 18 --gcpHMM 10 --pair_hmm_implementation FASTEST_AVAILABLE --pcr_indel_model CONSERVATIVE --phredScaledGlobalReadMismappingRate 45 --nativePairHmmThreads 4 --useDoublePrecision false --debug false --useFilteredReadsForAnnotations false --emitRefConfidence NONE --bamWriterType CALLED_HAPLOTYPES --disableOptimizations false --justDetermineActiveRegions false --dontGenotype false --dontUseSoftClippedBases false --captureAssemblyFailureBAM false --errorCorrectReads false --doNotRunPhysicalPhasing false --min_base_quality_score 10 --useNewAFCalculator false --annotateNDA false --heterozygosity 0.001 --indel_heterozygosity 1.25E-4 --heterozygosity_stdev 0.01 --standard_min_confidence_threshold_for_calling 10.0 --max_alternate_alleles 6 --max_genotype_count 1024 --genotyping_mode DISCOVERY --contamination_fraction_to_filter 0.0 --output_mode EMIT_VARIANTS_ONLY --allSitePLs false --readShardSize 5000 --readShardPadding 100 --minAssemblyRegionSize 50 --maxAssemblyRegionSize 300 --assemblyRegionPadding 100 --maxReadsPerAlignmentStart 50 --activeProbabilityThreshold 0.002 --maxProbPropagationDistance 50 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --minimumMappingQuality 20
[July 24, 2017 1:40:48 PM EDT] Executing as heidi@heidi-HP-Pavilion-dv6-Notebook-PC on Linux 4.10.0-27-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_141-b15; Version: 4.beta.2
13:40:49.017 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 5
13:40:49.017 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:40:49.017 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
13:40:49.017 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:40:49.017 INFO HaplotypeCaller - Deflater: JdkDeflater
13:40:49.017 INFO HaplotypeCaller - Inflater: JdkInflater
13:40:49.017 INFO HaplotypeCaller - Initializing engine
13:40:49.254 WARN IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
13:40:49.260 WARN IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
13:40:49.896 INFO HaplotypeCaller - Done initializing engine
13:40:49.902 INFO HaplotypeCallerEngine - Currently, physical phasing is only available for diploid samples.
13:40:50.226 WARN PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
13:40:50.503 WARN PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
13:40:50.925 INFO HaplotypeCaller - Shutting down engine
[July 24, 2017 1:40:50 PM EDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=218628096
java.lang.IllegalArgumentException: samples cannot be empty
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:681)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.ReferenceConfidenceModel.(ReferenceConfidenceModel.java:103)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.initialize(HaplotypeCallerEngine.java:165)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.(HaplotypeCallerEngine.java:146)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.onTraversalStart(HaplotypeCaller.java:200)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:836)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
at org.broadinstitute.hellbender.Main.main(Main.java:230)

↧

Truth or control samples - Variant calling

July 24, 2017, 11:00 am

≫ Next: MuTect2 beta --germline_resource for build b37

≪ Previous: IllegalArgumentException: samples cannot be empty

Are we able to incorporate truth/control samples in addition to dbSNP when calling variants with GVCF (cohorts) or the traditional way with HaplotypeCaller. There are for example situations where the sequences are for Australian, E Asian, or African samples, and we would like to include truth/control samples for those areas, perhaps from 1000 genomes or some other source.

If possible, what arguments do we use.

↧

MuTect2 beta --germline_resource for build b37

July 24, 2017, 2:16 pm

≫ Next: Base Recalibration

≪ Previous: Truth or control samples - Variant calling

Hi - I'm looking to run MuTect2 beta using the --germline_resource option. However, I've consistently used the b37 genome build throughout my analaysis, while the suggested resource (gnomad) appears to only be available for the hg19 build. So I'm wondering whether I should
1. Go ahead and use the gnomad hg19 files, despite the fact that my whole analysis has used the b37 build?
2. Lift over my existing gnomad vcfs from hg19 to b37? (In this case, I'd need an hg19tob37 liftOver file - I can't find one anywhere).
3. Use another germline resource?

I wonder which option you would recommend? Many thanks for your time.

↧

Base Recalibration

July 24, 2017, 11:38 pm

≫ Next: GATK workshop (Oxford)

≪ Previous: MuTect2 beta --germline_resource for build b37

Hi I am not much familiar with bioinformatics and SNP Genotyping. As I am trying to identify the SNP in my sample from SRA Database so this the pipeline i am following
I - STAR Aligner to mapping
II - 2 Pass mapping using SJ.out.tab
III - SAM to BAM conversion Sorting and Indexing
IV- Mark Duplicate
V- SplitNtrim
VI Base Recalibration using known VCF file
When i Use SplitNtim output for base recalibration using vcf file i got an ERROR MESSAGE: The platform (platform) associated with read group GATKSAMReadGroupRecord @RG:id is not a recognized platform. Allowable options are ILLUMINA,SLX,SOLEXA,SOLID ,454,LS454,COMPLETE,PACBIO,IONTORRENT,CAPILLARY,HELICOS,UNKNOWN
For this I have to add RG to the file which i am not able to identified As I check Mapping of STAR Aligner @RG information is not available in that SAM output file.

Second As per SRA database Its single run in illumia so i used RGPL=illumina RGLB=lib1 RGPU=unit1 RGSM=20 to add in splitntrim output Bam file i got this error
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Read name SRR2183534.2625035, No real operator (M|I|D|N) i n CIGAR and generating smaller size file than the input file.

whether i am doing it correct or if yes how can i add RG in Bam and from where i get this information or i can run without that

sorry for this long post

↧

GATK workshop (Oxford)

July 25, 2017, 1:18 am

≫ Next: Tool documentation

≪ Previous: Base Recalibration

Hi! GATK team,

I remember there is a workshop in Oxford (9/18~9/20).
Is it cancelled?

Thanks!

Dada

↧

Tool documentation

July 25, 2017, 5:15 am

≫ Next: Regarding Gatk output

≪ Previous: GATK workshop (Oxford)

Where has the tool documentation gone to? I keep getting 404 errors when trying to follow links to specific tools (from google) or the general tool documentation area.

↧

Regarding Gatk output

July 25, 2017, 6:01 am

≫ Next: Tools Documentation Pages are down

≪ Previous: Tool documentation

Hello,
I want to identify snps in my sequencing data.I am new in this field so i am just following guideline that you provided. i have paired end sequencing data (one sample) and according to documentation. i ran the programs and in the end i ran
java -Xmx32g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R genome.fasta -I realigned.bam -nct 14 -stand_call_conf 30 -o DK_new.vcf

After getting vcf file, i want to filter vcf file on the basis of allelic depth (DP) that describe in format column. Can you tell me how can i do this.

One more question about the DP value.. why DP value is different in info and format column???
i read the discussion about this and find that because of informative reads and uninformative reads. So it will always possible that allelic depth < total read depth at particular site. so it will better that to filter snps on basis of allelic depth than the total read depth.
I am totally confused that how can i filter snps ????

Please give suggestions about that.

Thanks & Regards

↧