Does anyone know how long the GenomeSTRiP page is going to be down?

April 3, 2019, 7:12 am

≫ Next: Changing compression level in GATK 4.0.0.0

≪ Previous: Does Mutect2 have a multiple-thread setting?

I was in the middle of working on something and had an error about missing commands but I can't check because the website was down! I have a deadline and so was wondering if you know how long it will be down for?

Thanks.

↧

Changing compression level in GATK 4.0.0.0

January 22, 2018, 6:37 am

≫ Next: GATK FastaAlternateReferenceMaker not correcting fasta reference

≪ Previous: Does anyone know how long the GenomeSTRiP page is going to be down?

When running GATK 4.0.0.0, (in this case using Apply BQSR) the notice

11:36:10.430 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 1

appears. A bit of digging led me to the Python code in the newly distributed gatk program. There, there are two variables that set -Dsamjdk.compression_level=1 by default. I changed the level here to 5, but the output from ApplyBQSR remained the same, and from the file sizes i'm seeing (though I may be wrong), it seems that the compression level is not at 5.

Thoughts?

↧

GATK FastaAlternateReferenceMaker not correcting fasta reference

March 14, 2019, 8:23 am

≫ Next: Joint Genotyping

≪ Previous: Changing compression level in GATK 4.0.0.0

Hi,
I am trying to use "GATK FastaAlternateReferenceMaker" but the output fasta file is the same as the one used in input. In other words, my fasta genome file is not corrected according to the vcf file used. I am wondering wether it is a misuse of myself or a bug of the tool.
Here is the cmd line I used :
$ nice -19 gatk FastaAlternateReferenceMaker -R Dp_PB-MI_190104_dedup.fasta -V Mi_M-B-Dp_PB_B-M-freebayes_onlyindels_cov_qMi+20_SRRF-notrepeat_sorted.vcf -O Dp_PB-MI_190104_dedup_gatkcorrected.fasta &>gatk.log

Thanks in advance for your help.

Paul

↧

Joint Genotyping

April 3, 2019, 9:51 am

≫ Next: problem of calling SNPs GenotypeGVCFs

≪ Previous: GATK FastaAlternateReferenceMaker not correcting fasta reference

I ran GATKHaplotypeCaller without GVCF mode. I have got the correct output VCF file. I want to do the joint Genotyping for BSA QTLSeqr analysis. The GATK GenoType is only for gVCF. I cannot run this mode, it takes too much time, even if run the aligner file in intervals. My question, how can I use the joint GenoTyping on the VCF file.

↧

problem of calling SNPs GenotypeGVCFs

January 26, 2019, 6:14 pm

≫ Next: Variantfiltration not recognizing first position annotations

≪ Previous: Joint Genotyping

Dear Mr/Ms.,

I used GenomicsDBImport and GenotypeGVCFs to call SNPs in GATK4. Due to large reference genome (1G) and sample size (100). I want to separate the work for each chromosome. It seems that GenomicsDBImport works for all chromosomes, but GenotypeGVCFs only works for chromosome1. Could you please give me some suggestions. Below are commands and log information for chromosome 2.

Look forward to hearing from you soon.
Best regards,
Baosheng

command, all variables are defined before command lines.

$GATK --java-options "-Xmx24g" \
GenomicsDBImport \
${InputVCF} \
--genomicsdb-workspace-path ${OUTDIR}/chr02 \
-L Qrob_Chr02

$GATK --java-options "-Xmx48g" \
GenotypeGVCFs \
-R ${REF} \
-V gendb://${OUTDIR}/chr02 \
-all-sites \
-O ${OUTDIR}/chr02.vcf

log file

23:41:40.274 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/WangBS/software/GATK/gatk/build/libs/gatk-package-4.0.11.0-56-g2c0e9b0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
23:42:41.990 INFO GenomicsDBImport - ------------------------------------------------------------
23:42:41.990 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.11.0-56-g2c0e9b0-SNAPSHOT
23:42:41.990 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
23:42:41.991 INFO GenomicsDBImport - Executing as WangBS@cu53 on Linux v3.10.0-693.el7.x86_64 amd64
23:42:41.991 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-b12
23:42:41.991 INFO GenomicsDBImport - Start Date/Time: January 26, 2019 11:41:40 PM CST
23:42:41.991 INFO GenomicsDBImport - ------------------------------------------------------------
23:42:41.991 INFO GenomicsDBImport - ------------------------------------------------------------
23:42:41.991 INFO GenomicsDBImport - HTSJDK Version: 2.18.1
23:42:41.991 INFO GenomicsDBImport - Picard Version: 2.18.16
23:42:41.992 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
23:42:41.992 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
23:42:41.992 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
23:42:41.992 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
23:42:41.992 INFO GenomicsDBImport - Deflater: IntelDeflater
23:42:41.992 INFO GenomicsDBImport - Inflater: IntelInflater
23:42:41.992 INFO GenomicsDBImport - GCS max retries/reopens: 20
23:42:41.992 INFO GenomicsDBImport - Requester pays: disabled
23:42:41.992 INFO GenomicsDBImport - Initializing engine
23:42:44.099 INFO IntervalArgumentCollection - Processing 115639695 bp from intervals
23:42:44.102 INFO GenomicsDBImport - Done initializing engine
23:42:44.276 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/WangBS/Analyses/vcf/test/chr02/vidmap.json
23:42:44.276 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/WangBS/Analyses/vcf/test/chr02/callset.json
23:42:44.276 INFO GenomicsDBImport - Complete VCF Header will be written to /home/WangBS/Analyses/vcf/test/chr02/vcfheader.vcf
23:42:44.276 INFO GenomicsDBImport - Importing to array - /home/WangBS/Analyses/vcf/test/chr02/genomicsdb_array
23:42:44.276 INFO ProgressMeter - Starting traversal
23:42:44.276 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
23:42:45.830 INFO GenomicsDBImport - Importing batch 1 with 63 samples
Buffer resized from 37294 bytes to 65464
Buffer resized from 37294 bytes to 65511
Buffer resized from 37293 bytes to 65539
Buffer resized from 37294 bytes to 65447
.....
.....
Buffer resized from 65538 bytes to 65539
Buffer resized from 65538 bytes to 65539
Buffer resized from 65538 bytes to 65539
06:50:14.219 INFO ProgressMeter - Qrob_Chr02:1 427.5 1 0.0
06:50:14.220 INFO GenomicsDBImport - Done importing batch 1/1
06:50:14.221 INFO ProgressMeter - Qrob_Chr02:1 427.5 1 0.0
06:50:14.229 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 427.5 minutes.
06:50:14.236 INFO GenomicsDBImport - Import completed!
06:50:14.236 INFO GenomicsDBImport - Shutting down engine
[January 27, 2019 6:50:14 AM CST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 428.57 minutes.
Runtime.totalMemory()=8988393472
Tool returned:
true
Using GATK jar /home/WangBS/software/GATK/gatk/build/libs/gatk-package-4.0.11.0-56-g2c0e9b0-SNAPSHOT-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx24g -jar /home/WangBS/software/GATK/gatk/build/libs/gatk-package-4.0.11.0-56-g2c0e9b0-SNAPSHOT-local.jar GenotypeGVCFs -R /home/WangBS/Reference/Qrobur/Qrob_PM1N.fa -V gendb:///home/WangBS/Analyses/vcf/test/chr02 -all-sites -O /home/WangBS/Analyses/vcf/test/chr02.vcf
06:50:19.236 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/WangBS/software/GATK/gatk/build/libs/gatk-package-4.0.11.0-56-g2c0e9b0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
06:51:21.116 INFO GenotypeGVCFs - ------------------------------------------------------------
06:51:21.116 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.11.0-56-g2c0e9b0-SNAPSHOT
06:51:21.116 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
06:51:21.117 INFO GenotypeGVCFs - Executing as WangBS@cu53 on Linux v3.10.0-693.el7.x86_64 amd64
06:51:21.117 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-b12
06:51:21.117 INFO GenotypeGVCFs - Start Date/Time: January 27, 2019 6:50:19 AM CST
06:51:21.117 INFO GenotypeGVCFs - ------------------------------------------------------------
06:51:21.117 INFO GenotypeGVCFs - ------------------------------------------------------------
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Version: 2.18.1
06:51:21.118 INFO GenotypeGVCFs - Picard Version: 2.18.16
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
06:51:21.118 INFO GenotypeGVCFs - Deflater: IntelDeflater
06:51:21.118 INFO GenotypeGVCFs - Inflater: IntelInflater
06:51:21.118 INFO GenotypeGVCFs - GCS max retries/reopens: 20
06:51:21.118 INFO GenotypeGVCFs - Requester pays: disabled
06:51:21.118 INFO GenotypeGVCFs - Initializing engine
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
06:51:26.212 INFO GenotypeGVCFs - Done initializing engine
06:51:26.257 INFO ProgressMeter - Starting traversal
06:51:26.257 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
06:51:26.278 INFO GenotypeGVCFs - Shutting down engine
[January 27, 2019 6:51:26 AM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 1.12 minutes.
Runtime.totalMemory()=1972371456
java.lang.IllegalStateException: There are no sources based on those query parameters
at com.intel.genomicsdb.reader.GenomicsDBFeatureIterator.(GenomicsDBFeatureIterator.java:131)
at com.intel.genomicsdb.reader.GenomicsDBFeatureReader.query(GenomicsDBFeatureReader.java:144)
at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:534)
at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:503)
at org.broadinstitute.hellbender.engine.FeatureDataSource.query(FeatureDataSource.java:469)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$2(VariantLocusWalker.java:144)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:143)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

↧

Variantfiltration not recognizing first position annotations

March 21, 2019, 2:20 am

≫ Next: GATK (v4.0.10.1) CombineGVCFs failing with 'java.lang.OutOfMemoryError'; not using memory provided

≪ Previous: problem of calling SNPs GenotypeGVCFs

Dear GATK staffs,

I tried to do Variantfiltration of my result from GenotypeGVCFs so that i can combined the variant into a single vcf file before pipe into VQSR. This is because my 31 vcf files have very few variant, ranging from 100 to 3000 which makes the Gausian model less reliable(full of FP showed in the graph as well). Feel free to advice me if i am doing silly mistakes in this workflow.

However, in both enxountering of Variantfiltration (using my result from GVCFs and result after VQSR) , i faced the same problem.

17:01:16.119 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0 || MQRankSum < -12.5 || QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0;' undefined variable ReadPosRankSum
17:01:16.121 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0 || MQRankSum < -12.5 || QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0;' undefined variable ReadPosRankSum
17:01:16.125 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0 || MQRankSum < -12.5 || QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0;' undefined variable ReadPosRankSum
17:01:16.126 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0 || MQRankSum < -12.5 || QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0;' undefined variable ReadPosRankSum

When i swapped ReadposRankSum with other annotation, those that placed in the first position in the beginning will be reported undefined. Hope that i can find an answer here.

My commands:
for file in *.vcf.gz; do gatk VariantFiltration -R $reference_dir -O ${file%%.vcf.gz}_filtered.vcf.gz -V $file --filter-name "snps_filter" --filter-expression "ReadPosRankSum < -8.0 || MQRankSum < -12.5 || QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0" ; done

↧

GATK (v4.0.10.1) CombineGVCFs failing with 'java.lang.OutOfMemoryError'; not using memory provided

October 22, 2018, 11:00 am

≫ Next: picard samtofastq error of unmapped reads

≪ Previous: Variantfiltration not recognizing first position annotations

Hi,

We ran a CombineGVCFs job using the following command, where gvcfs.list contained only 31 gvcf files with 24 samples each:

$GATK --java-options "-Xmx650G" \
CombineGVCFs \
-R $referenceFasta \
-O full_cohort.b37.g.vcf \
--variant gvcfs.list

We tried the extreme memory because CombineGVCFs kept failing. This node has 750G of RAM.

Despite the high memory provided, we get the stacktrace below. The total memory reported by GATK is only ~12G, though (Runtime.totalMemory()=12662603776). Am I missing something? I don't understand why GATK is only using 12G of RAM when we provided much more, and then failing with an OutOfMemoryError.

We are currently setting up GenomicsDBImport, but this seems worth reporting.

Really appreciate your help.

18:55:51.944 INFO ProgressMeter - 4:26649295 23.6 18617000 787894.4
18:56:01.988 INFO ProgressMeter - 4:26655758 23.8 18779000 789159.6
18:59:13.407 INFO CombineGVCFs - Shutting down engine
[October 19, 2018 6:59:13 PM CDT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 27.06 minutes.
Runtime.totalMemory()=12662603776
Exception in thread "main" java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:316)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at java.io.BufferedWriter.close(BufferedWriter.java:266)
at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:226)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.closeTool(CombineGVCFs.java:461)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:970)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

↧

picard samtofastq error of unmapped reads

April 3, 2019, 4:38 pm

≫ Next: GATK runtime error (READ_MAX_LENGTH must be > 0 but got 0) with 1000g bam

≪ Previous: GATK (v4.0.10.1) CombineGVCFs failing with 'java.lang.OutOfMemoryError'; not using memory provided

I fisrt extract all mapped reads

samtools view -h -F 4 xx.hlamap.sam > xx.mapped.sam

then convert it
java -jar picard.jar SamToFastq I=xx.mapped.sam F=xx.hlatmp.1.fastq F2=xx.hlatmp.2.fastq

but it has errors like following, how should I revise it, thanks a lot

↧

GATK runtime error (READ_MAX_LENGTH must be > 0 but got 0) with 1000g bam

April 7, 2017, 1:48 am

≫ Next: Should I use OUTPUT_BY_READGROUP on RevertSam and why?

≪ Previous: picard samtofastq error of unmapped reads

Hi,

I'm trying to build a pon with GATK 3.7-0 to use with mutect2. For that, I've downloaded 80 exome bam files from the 1000g project (GBR, TSI, IBS and CEU populations).
For most of them, when I try to use the artifact_dectection_mode, I get a GATK runtime error saying 'READ_MAX_LENGTH must be > 0 but got 0'.
To try you can, for example, download ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00116/exome_alignment/HG00116.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam
I'm using the b37 reference files from the bundle_gatk_2_8 and the bed file of SureSelect6 from Agilent .

Here the full stack trace and command line :

INFO 09:42:32,040 HelpFormatter - ------------------------------------------------------------------------------------
INFO 09:42:32,045 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 09:42:32,045 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 09:42:32,045 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 09:42:32,045 HelpFormatter - [Thu Apr 06 09:42:32 GMT 2017] Executing on Linux 2.6.18-275.12.1.el5.573g0000 amd64
INFO 09:42:32,045 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-b25
INFO 09:42:32,049 HelpFormatter - Program Args: -T MuTect2 -I:tumor /data/misc/mutect2/pon/1000g_bam/HG00116.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam --db
snp /data/highlander/reference/bundle_gatk_2_8/b37/dbsnp_138.b37.vcf --artifact_detection_mode -L /data/highlander/reference/bundle_gatk_2_8/b37/capture.sureselect6.bed -R /data/highlander/reference/bundle_gatk_2_8/b37/human_g1k_v37.fasta -o /data/misc/mutect2/pon/1000g_vcf_normal/HG00116.vcf.gz
INFO 09:42:32,062 HelpFormatter - Executing as lifescope@n3 on Linux 2.6.18-275.12.1.el5.573g0000 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-b25.
INFO 09:42:32,062 HelpFormatter - Date/Time: 2017/04/06 09:42:32
INFO 09:42:32,062 HelpFormatter - ------------------------------------------------------------------------------------
INFO 09:42:32,063 HelpFormatter - ------------------------------------------------------------------------------------

...

ERROR --

ERROR stack trace

java.lang.IllegalArgumentException: READ_MAX_LENGTH must be > 0 but got 0
at org.broadinstitute.gatk.utils.pairhmm.PairHMM.initialize(PairHMM.java:126)
at org.broadinstitute.gatk.utils.pairhmm.N2MemoryPairHMM.initialize(N2MemoryPairHMM.java:60)
at org.broadinstitute.gatk.utils.pairhmm.LoglessPairHMM.initialize(LoglessPairHMM.java:66)
at org.broadinstitute.gatk.utils.pairhmm.PairHMM.initialize(PairHMM.java:159)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.initializePairHMM(PairHMMLikelihoodCalculationEngine.java:267)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngine.java:282)
at org.broadinstitute.gatk.tools.walkers.cancer.m2.MuTect2.map(MuTect2.java:644)
at org.broadinstitute.gatk.tools.walkers.cancer.m2.MuTect2.map(MuTect2.java:171)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------

ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):

ERROR

ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

ERROR If not, please post the error message, with stack trace, to the GATK forum.

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://software.broadinstitute.org/gatk

ERROR

ERROR MESSAGE: READ_MAX_LENGTH must be > 0 but got 0

ERROR ------------------------------------------------------------------------------------------

Am I doing something wrong or is it some kind of bug :-) ?

Thank you in advance for your help

Raphael

↧

Should I use OUTPUT_BY_READGROUP on RevertSam and why?

March 21, 2019, 11:56 am

≫ Next: Multi-allelic sites in VQSR

≪ Previous: GATK runtime error (READ_MAX_LENGTH must be > 0 but got 0) with 1000g bam

Hi,

We have re-analyzed a TCGA WES sample by taking the BAM file and using RevertSam and then putting it through the standard pipeline. We have noticed about a 1% difference in variants when doing it by read-group (i.e. if we produce a uBAM per read-group and then merge then at the point of MarkDuplicates) than when we do it without read-group.

Separating by read-group is a bit of a nuisance for our pipeline and we wanted to know if it is correct not to do so. I take it all the read-groups have followed the same sequencing protocols. I imagine this may have to do with BWA being read-group aware.

Could you please clarify

If we can safely disregard OUTPUT_BY_READGROUP in general
If not, why?

Thanks a lot for a terrific tool.

↧

Multi-allelic sites in VQSR

March 21, 2019, 7:04 am

≫ Next: Input for VQSR from Mergevcfs

≪ Previous: Should I use OUTPUT_BY_READGROUP on RevertSam and why?

I was wondering how GATK VQSR deals with multi-allelic sites.
I already know that -
i) VQSR treats them same way as bi-allelic sites (https://gatkforums.broadinstitute.org/gatk/discussion/7754/how-vqsr-deals-with-multiallelic-snps-and-indel)
ii) Split multi-allelic sites before VQSR (https://gatkforums.broadinstitute.org/gatk/discussion/23559/split-multiallelic-variants-before-vqsr-and-cnnscorevariants-gatk-team-opinion).
This mainly informs about mixed (SNP + INDEL) multi-allelic sites.

Summary questions:
1. Do you recommend split multi-allelic SNPs before VQSR? Will it be biased since site-level information/annotation would be multiple counted. I got different results in split and NOT split (performed relatively better)
2. If we don't split multi-allelic SNP sites then how Ti/Tv ratio is calculated.
For example:

chr1 123 A T,G
chr2 234 C *,A,T

In these above cases, which allele(s) is taken to calculate the Ti/Tv ratio in the tranche file. If VQSR takes the first allele then what to expect in 2nd case, where a star allele is at first position Or it is better to remove star alleles before VQSR?

↧

Input for VQSR from Mergevcfs

March 21, 2019, 3:35 am

≫ Next: Haplotype phasing somatic mutations from MuTect2 using read-backed phasing and parental data

≪ Previous: Multi-allelic sites in VQSR

Dear GATK staff,

i have 28 vcf files from 31 humans exome data, the output after GenotypeGVCF according to the targeted gene intervals shows very little variant in each vcf files , less than 200 or 60 which seems might create a less reliable Gaussian model in VQSR. Should i use Mergevcfs to combine the 31 vcf files into a single file before piped them into VQSR?

↧

Haplotype phasing somatic mutations from MuTect2 using read-backed phasing and parental data

March 7, 2019, 7:38 am

≫ Next: Is there any difference between the HaplotypeCaller employed by gatk3 and gatk4 respectively?

≪ Previous: Input for VQSR from Mergevcfs

To whom it may concern,

I have both normal and tumour sample and I also have the parental data (both mother and father) for the patient sample. I hope to first haplotype phase the SNP and INDELs from the haplotype caller using PhaseByTransmission. Thereafter, I wanted to haplotype phase the somatic mutations from MuTect2 using Read-Backed Phasing.

I wanted to ask whether the Read-Backed Phasing method will consider both the SNP and INDEL encompassed within the read and whether it will also consider the information from PhaseByTransmission when haplotype phasing the somatic mutations.

Regards,
Sangjin Lee

↧

Is there any difference between the HaplotypeCaller employed by gatk3 and gatk4 respectively?

March 15, 2019, 3:07 am

≫ Next: GenotypeGVCFs WARN Track variant doesn't have a sequence dictionary built in

≪ Previous: Haplotype phasing somatic mutations from MuTect2 using read-backed phasing and parental data

I found some differences when I compare the results produced by GATK3 and GATK4 respectively. Both calling method is HaplotypeCaller and the number of GATK3 calling result exceed GATK4. Then I ran GATK3 HaplotypeCaller again to evaluate if there exist differences between batches. But the result indicate two results of GATK3 are the same. I have got confused with this phenomenon.

↧

GenotypeGVCFs WARN Track variant doesn't have a sequence dictionary built in

March 21, 2015, 2:14 am

≫ Next: GATK4 pipeline in easy bash scripts, please

≪ Previous: Is there any difference between the HaplotypeCaller employed by gatk3 and gatk4 respectively?

Hi Team,
I'm getting `WARN  21:19:30,478 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation` when processing gzipped g.vcf files produced by HaplotypeCaller (via -o foo.g.vcf.gz, as suggested by @Geraldine_VdAuwera in blog post 3893) with GenotypeGVCFs.
This results in dramatic increases in run time (makes sense if GenotypeGVCFs un-compresses the files), and memory requirements (why ??) for GenotypeGVCFs compared to processing the gvcf for same bam files if HC outfiles are unzipped. Most batches that previously completed with 4x8GB RAM now produce `java.lang.OutOfMemoryError: Java heap space` errors even with 4X64GB!

Could you please advise whether this warning is expected behaviour? If yes, what exactly is missing (can't see much difference in unzipped vs gzipped vcf headers), and can this be added somehow?

↧

GATK4 pipeline in easy bash scripts, please

February 20, 2019, 2:12 pm

≫ Next: help me, GATK4 VQSR Error

≪ Previous: GenotypeGVCFs WARN Track variant doesn't have a sequence dictionary built in

Hi, I asked this question a while ago and a few times. I know, there is a wonderful WDL platform and fire cloud stuff to run things in parallel and check this and that. But, for someone who are so used to a series of simple BASH commands, can you guys please kindly provide an example script like the one shown here https://gencore.bio.nyu.edu/variant-calling-pipeline/?

Right after I found the above, I found that it is not updated with GATK4, and I would hate to use a pipeline that is based on an outdated version of engine.

I will say "Thank You So Much, GATK". For almost a year, I still could not make GATK run on my own server, although there are a million documentation and tutorial and PPTs googled everywhere.

↧

help me, GATK4 VQSR Error

July 16, 2018, 3:56 pm

≫ Next: GATK 4.0.11.0 Variant Recalibrator ERROR

≪ Previous: GATK4 pipeline in easy bash scripts, please

How can i handle it?

Error message :
A USER ERROR has occurred: The argument: "resource/resource" does not accept tags: "hapmap,known=false,training=true,truth=true,prior=15.0"

Command :smile:

java -Xmx60g -jar /UUU/chul/wes/tools/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar VariantRecalibrator -R /UUU/chul/wes/hg19/ucsc.hg19.fasta -input new.vcf -input new.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /UUU/chul/wes/hg19/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /UUU/chul/wes/hg19/1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /UUU/chul/wes/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /UUU/chul/wes/hg19/dbsnp_138.hg19.vcf -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -O t4.recalibrate_SNP.recal --tranches-file t4.recalibrate_SNP.tranches --rscript-file t4.recalibrate_SNP_plots.R

↧

GATK 4.0.11.0 Variant Recalibrator ERROR

November 6, 2018, 12:18 am

≫ Next: Error while running HaplotypeCaller with reference genome UCSC hg19.fa and NCBI dbSNP.vcf file.

≪ Previous: help me, GATK4 VQSR Error

Could someone please provide me with a help to run Variant Recalibrator for GATK4.0.11.0？
when running the tool using GATK 4.0.11.0 with the following command line:

time ~/gatk-4.0.11.0/gatk VariantRecalibrator
-R ~/reference/hg19.fa -V ~/MT-1/outname.HC.vcf.gz
--resource hapmap,known=false,training=true,truth=true,prior=15.0:~/reference/hg19/hapmap_3.3.hg19.sites.vcf
--resource omni,known=false,training=true,truth=false,prior=12.0:~/reference/hg19/1000G_omni2.5.hg19.sites.vcf
--resource 1000G,known=false,training=true,truth=false,prior=10.0:~/reference/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf
--resource dbsnp,known=true,training=false,truth=false,prior=6.0:~/reference/hg19/dbsnp_138.hg19.vcf
--use-annotation DP --use-annotation QD --use-annotation FS --use-annotation SOR --use-annotation ReadPosRankSum --use-annotation MQRankSum
--mode SNP
--truth-sensitivity-tranche 100.0 --truth-sensitivity-tranche 99.9 --truth-sensitivity-tranche 99.0 --truth-sensitivity-tranche 95.0 --truth-sensitivity-tranche 90.0
--rscript-file ~/MT-1/outname.HC.snps.plots.R
--tranches-file ~/MT-1/outname.HC.snps.tranches
--output ~/MT-1/outname.HC.snps.recal

I met this questiion: A USER ERROR has occurred: Couldn't read file file:///home/chenjie1/~/reference/hg19/hapmap_3.3.hg19.sites.vcf. Error was: It doesn't exist

The command syntax follows the same pattern as version 4.0.9.0.
Has the syntax been changed for GATK version 4.0.11.0?
Thanks.
Best regards.

↧

Error while running HaplotypeCaller with reference genome UCSC hg19.fa and NCBI dbSNP.vcf file.

April 4, 2019, 1:22 pm

≫ Next: Possible to using CNNScoreVariants with PacBio reads?

≪ Previous: GATK 4.0.11.0 Variant Recalibrator ERROR

Hello, @bhanuGandham I am trying to run HaplotypeCaller on Exome data, with the UCSC hg19 reference genome. Also, I wanted run against dbsnp.vcf (which I downloaded from the NCBI site) I also created the index file of the dbSNP.vcf file (.idx file was generated). Now when I ran HaplotypeCaller using the command:

./gatk HaplotypeCaller -R /home/vipul/vipul/wes/hg19.fa -I /home/vipul/vipul/wes/IITK-P4-TD/IITK-P4-TD.recal.bam --dbsnp /home/vipul/vipul/wes/dbSNP_ALL_hg19_151_contig_modified.vcf -O variantsP4TDSNP.vcf

I am getting an error:

***********************************************************************

A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT]

***********************************************************************

I am unable to troubleshoot, how to create a right index file of dbSNP.vcf.
I know the fact that UCSC hg19.fa file has "Chr" added to its indexed or fasta file, so is not the case with dbSNP fasta files (I guess).

Your help would be of great help. I am new to this WES analysis. Look forward to your response.

↧

Possible to using CNNScoreVariants with PacBio reads?

April 4, 2019, 5:24 pm

≫ Next: commands to for variant calling

≪ Previous: Error while running HaplotypeCaller with reference genome UCSC hg19.fa and NCBI dbSNP.vcf file.

We have developed a SNV calling method for long reads (https://github.com/pjedge/longshot). The false positive variants that result from our method tend to occur in certain sequence contexts and often have various signals that could be used in conjunction to filter them (including some based on assembled haplotype consistency, etc). It would be nice to be able to combine these signals (reference sequence context as well as annotations in our VCF) to filter variants using a supervised learning approach. I am interested in using CNNVariantWriteTensors, CNNVariantTrain, and CNNScoreVariants for this task, but I'm not sure that it's even possible. Are there design considerations that fundamentally make these tools incompatible with non-illumina sequencing technologies? Further, our output VCF lacks most of the annotations specified in GATK best practices and a lot of those best practice annotations are geared toward Illumina reads. I think a lot of those annotations would not be good features for PacBio reads, if I were to just plug my data into VariantAnnotator to fill in annotations. We would be especially interested in leveraging custom annotations that are long-read specific. Would it be possible for us to define our own annotation set to use with these tools?

↧