Hi, It looks like all the pages for tools documentation are down right now (11:30am EST). See the attached image for the error message I get when I try to visit those pages.
~Daniel Ence
Hi, It looks like all the pages for tools documentation are down right now (11:30am EST). See the attached image for the error message I get when I try to visit those pages.
~Daniel Ence
Hello,
I am merging a large number of gvcfs in batches and everything looks fine except the DP field. In some samples the DP field is not present in the gvcf files (see below) and therefore when merging all samples some of the samples have a . (DOT) in the DP field because it is not present in the gvcf file. Main problem is that when in later analysis filtering for DP than all the . (DOT) variants are removed... Is there a way to solve this problem without remaking all the gvcf files?
Thanks & Regards, Floris
gVCF file 1:
3 148896153 . A <NON_REF> . . END=148896155 GT:DP:GQ:MIN_DP:PL 0/0:33:93:33:0,93,1395
3 148896156 . A <NON_REF> . . END=148896159 GT:DP:GQ:MIN_DP:PL 0/0:36:99:35:0,99,1282
3 148896160 . G <NON_REF> . . END=148896160 GT:DP:GQ:MIN_DP:PL 0/0:37:76:37:0,76,1445
3 148896161 . A <NON_REF> . . END=148896395 GT:DP:GQ:MIN_DP:PL 0/0:81:99:37:0,107,1356
3 148896396 . C G,<NON_REF> 1171.77 . BaseQRankSum=1.677;DP=61;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00;MQRankSum=0.719;ReadPosRankSum=0.849 GT:AD:GQ:PL:SB 0/1:27,34,0:99:1200,0,921,1282,1023,2305:4,23,1,33
3 148896397 . C <NON_REF> . . END=148896444 GT:DP:GQ:MIN_DP:PL 0/0:52:99:40:0,105,1575
3 148896445 . C <NON_REF> . . END=148896445 GT:DP:GQ:MIN_DP:PL 0/0:40:86:40:0,86,1515
3 148896446 . C <NON_REF> . . END=148896447 GT:DP:GQ:MIN_DP:PL 0/0:38:99:38:0,99,1485
3 148896448 . T <NON_REF> . . END=148896449 GT:DP:GQ:MIN_DP:PL 0/0:39:90:38:0,90,1350
gVCF file 2:
3 148895575 . T <NON_REF> . . END=148895818 GT:DP:GQ:MIN_DP:PL 0/0:126:99:60:0,120,1800
3 148896150 . A <NON_REF> . . END=148896150 GT:DP:GQ:MIN_DP:PL 0/0:33:96:33:0,96,1440
3 148896151 . A <NON_REF> . . END=148896395 GT:DP:GQ:MIN_DP:PL 0/0:77:99:34:0,99,1485
3 148896396 rs139633388 C G,<NON_REF> 1035.77 . BaseQRankSum=0.571;ClippingRankSum=0.313;DB;DP=68;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00;MQRankSum=0.608;ReadPosRankSum=-0.006 GT:AD:DP:GQ:PL:SB 0/1:32,36,0:68:99:1064,0,933,1160,1042,2202:8,24,11,25
3 148896397 . C <NON_REF> . . END=148896432 GT:DP:GQ:MIN_DP:PL 0/0:55:99:38:0,99,1485
3 148896433 . A <NON_REF> . . END=148896434 GT:DP:GQ:MIN_DP:PL 0/0:38:96:38:0,96,1440
3 148896435 . T <NON_REF> . . END=148896435 GT:DP:GQ:MIN_DP:PL 0/0:38:99:38:0,99,1485
I have incomplete readgroups in some bam-files, and tried to update them using the recomended script from the tutorial;
java -jar $PICARD_JAR AddOrReplaceReadGroups \
I= /work/users/ED-Sam.sorted.bam \
O= /work/users/ED-Sam_newRG.sorted.bam \
RGID=IDEDSAM \
RGLB=lib1 \
RGPL=illumina \
RGPU=BH3NTKALXX.8.3 \
RGSM=IDEDSAM \
SORT_ORDER=coordinate \
CREATE_INDEX=true
I get this error Message (have tested two different Version of Pickard (1.139 and 2.10.3), both gives the identical error:)
`
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line:
@RG ID:EDSAM PL:ILLUMINA; File /work/users/ED-Sam.sorted.bam; Line number 3270
at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:237)
at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:43)
at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:319)
at htsjdk.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:167)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:100)
at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:503)
at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:166)
at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:125)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:258)
at picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:94)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Suggestions?
SO I have been analysis my sequence data and try to do realignment.
After I run the cmd RealignerTargetCreator I got the interval files.
However the intervals file start with first line as:
There were no warn messages.
And when I use this interval files for realignment.
The ERROR message pop out saying:
Badly formed genome location: Contig 'There were no warn messages.' dose not match any contig.
Is there someting wrong with my process?
HI,
I read the BP,found should do bqsr first ,then do IndelRealigner,
I do not know what that will affect ,any big effects to bam results appears ?
Hi,
Following the advice seen elsewhere on the forum I have performed variant calling with whole-genome resequencing data on a per-scaffold basis. Now, I need to merge 30000 or so individual VCFs and I am using CatVariants for that using the following command:
java -Xmx5G -cp GenomeAnalysisTK-3.7-0/GenomeAnalysisTK.jar org.broadinstitute.gatk.tools.CatVariants -R $ref.fasta -out $out.vcf -assumeSorted -V $allvcfs.list
Unfortunately, it appears to be very slow (about only 5000 regions processed after >24h), so I am wondering if this is the expected behavior and if there is a way to increase the speed.
Thanks
JM
I'm getting a tribble loadIndex exception from RealignerTargetCreator. I see this exception has been reported quite a bit, and one cause seems to be out-of-date index file. I deleted the .bai then recreated it with samtools index, and still got the error. Then I tried deleting it without recreating it, since apparently RealignerTargetCreator will create an index if there is none. But that still gave the error. Help!
INFO 11:13:55,502 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:13:55,615 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 11:13:55,615 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 11:13:55,616 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 11:13:55,616 HelpFormatter - [Wed Jul 26 11:13:55 PDT 2017] Executing on Linux 4.4.0-47-generic amd64
INFO 11:13:55,616 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_72-b15
INFO 11:13:55,621 HelpFormatter - Program Args: --analysis_type RealignerTargetCreator --reference_sequence /share/carvajal-archive/REFERENCE_DATA/genomes/GRCh38_decoy_LCCpanel/Homo_sapiens_assembly38_LCCpanel.fasta --intervals BED/3025671_Covered_hg38_decoyLCCpnl.pad200.bed --input_file DATA/CH-59/N2/CH-59N2.dedup.bam --out DATA/CH-59/N2/CH-59N2.realign.interval_list --validation_strictness STRICT -known /share/carvajal-archive/REFERENCE_DATA/GATK_Bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf
INFO 11:13:55,626 HelpFormatter - Executing as twtoal@carcinos on Linux 4.4.0-47-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_72-b15.
INFO 11:13:55,626 HelpFormatter - Date/Time: 2017/07/26 11:13:55
INFO 11:13:55,627 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:13:55,627 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:13:55,634 GenomeAnalysisEngine - Strictness is STRICT
INFO 11:13:59,404 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 11:13:59,411 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 11:14:00,008 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.60
##### ERROR --
##### ERROR stack trace
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:173)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadFromDisk(RMDTrackBuilder.java:375)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.attemptToLockAndLoadIndexFromDisk(RMDTrackBuilder.java:359)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:319)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:264)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:153)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:208)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:88)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1052)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:829)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:287)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:169)
... 14 more
Caused by: java.io.EOFException
at htsjdk.tribble.util.LittleEndianInputStream.readFully(LittleEndianInputStream.java:138)
at htsjdk.tribble.util.LittleEndianInputStream.readLong(LittleEndianInputStream.java:80)
at htsjdk.tribble.index.linear.LinearIndex$ChrIndex.read(LinearIndex.java:271)
at htsjdk.tribble.index.AbstractIndex.read(AbstractIndex.java:367)
at htsjdk.tribble.index.linear.LinearIndex.<init>(LinearIndex.java:101)
... 19 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.reflect.InvocationTargetException
##### ERROR ------------------------------------------------------------------------------------------
Hi, guys, I tried to run GGA(GENOTYPE_GIVEN_ALLELES) of GATK4.beta.1, but failed with NullPointerException
, I'm sure that my input file and parameter settings are OK, cause I have checked my setting with this post of our forum and 4.beta.1's docs, also the equivalent parameters work fine for GATK3.7.
I haven't tested GGA with 4.beta.3 and 4.beta.2, as the release notes shows that there is no update related to this function. I'm wondering if GGA can function well for 4.beta or the future general release or maybe I need to change my parameters to get it running up? Below is my parameters and error log.
gatk-launch --javaOptions "-Xmx4g" HaplotypeCaller \
-R /reference/BWAIndex/genome.fa \
-I miseq_161113_PE75.bwa.sorted.filtered.recal.bam \
-O miseq_161113_PE75_gatk4_pgkb.vcf \
-L /path/to/my.vcf.gz \
--alleles /path/to/my.vcf.gz \
--genotyping_mode GENOTYPE_GIVEN_ALLELES
[July 14, 2017 2:51:14 PM CST] Executing as jiecui@Neptune on Linux 4.4.0-83-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11; Version: 4.beta.1
14:51:14.936 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
14:51:14.936 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:51:14.936 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:51:14.936 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:51:14.936 INFO HaplotypeCaller - Deflater: IntelDeflater
14:51:14.936 INFO HaplotypeCaller - Inflater: IntelInflater
14:51:14.936 INFO HaplotypeCaller - Initializing engine
......
14:51:15.342 INFO IntervalArgumentCollection - Processing 43 bp from intervals
14:51:15.350 INFO HaplotypeCaller - Done initializing engine
14:51:15.356 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
14:51:15.594 WARN PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
14:51:15.737 WARN PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
14:51:15.964 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/media/home/jiecui/software/gatk/gatk-4.beta.1/gatk-package-4.beta.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
[INFO] Available threads: 40
[INFO] Requested threads: 4
[INFO] Using 4 threads
14:51:16.035 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
14:51:16.051 INFO ProgressMeter - Starting traversal
14:51:16.051 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
log4j:WARN No appenders could be found for logger (org.broadinstitute.hellbender.utils.MathUtils$Log10Cache).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14:51:16.424 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.001359444
14:51:16.424 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.004590645
14:51:16.424 INFO HaplotypeCaller - Shutting down engine
[July 14, 2017 2:51:16 PM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1598029824
java.lang.NullPointerException
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerGenotypingEngine.createAlleleMapper(AssemblyBasedCallerGenotypingEngine.java:159)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:128)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:221)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:244)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:217)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:838)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
at org.broadinstitute.hellbender.Main.main(Main.java:230)
Java and GATK version:
Java version: openjdk version "1.8.0_131"
GATK version: 4.beta.1
Hello,
I am creating PoN for Mutect2 and following an instruction in the comments of Mutect2.java
This task seems to take highly variable time per sample or interval. I also realized that Mutect2 is not a Spark tool in GATK 4.
Is splitting intervals (and maybe --nativePairHmmThreads) only way to parallelize this task?
I wonder if you have any advice on parallelization in running Mutect2.
Thank you!
chr1 53676448 . G A 1495.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.48;ClippingRankSum=-5.270e-01;DP=79;ExcessHet=3.0103;FS=4.485;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.775;QD=19.18;ReadPosRankSum=0.403;SOR=0.581;set=variant2 GT:AD:DP:GQ:PL:CGIANN_VARNAME:CGIANN_1000GAF:CGIANN_ESP6500AF 0/1:29,49:78:99:1524,0,780:-,NM_000098.2(CPT2) c.1102G>A (p.V368I):-,0.5:-,0.456405
chr1 53676986 . C . . . END=53678942;NT GT ./.
chr1 53679264 . T . . . END=53680317;NT GT ./.
chr1 53680529 . a . . . END=53681541;NT GT ./.
chr1 53681771 . G . . . END=53682332;NT GT ./.
chr1 53682540 . G . . . END=53683699;NT GT ./.
I only saw one mutation with quite some information, and several other lines without information. How come I have a cpt-2 deficiency?
Hi GATK team,
Does using --dontUseSoftClippedBases option in HC affect SNP/indel calling in anyway? Can you please elaborate a little bit on how enabling this parameter affects variant calling?
Thanks,
Teja.
does it mean this region is not sequenced?
chr1 53668182 . T . . NT END=53675682;NT GT ./.
chr1 53675849 . C . . NT END=53675853;NT GT ./.
Please review the questions I posted at https://gatkforums.broadinstitute.org/gatk/discussion/10041/varianteval-error-message#latest.
They appear as "Answered" but are in fact not.
Thanks.
Hi there,
I would like to run gatk SNP call in TACC stampede2 machine (https://portal.tacc.utexas.edu/user-guides/stampede2) and they have the KNL node equipped with Intel Xeon Phi 7250. This CPU is kinda special because "Stampede2's KNL nodes have 68 cores, each with 4 hardware threads".
Right now, the command I used is:
java -Djava.io.tmpdir=/tmp -jar $TACC_GATK_DIR/GenomeAnalysisTK.jar -nct 136 -R assembly_selfref_v2.fa -T UnifiedGenotyper ......
Basically, TACC support people told me that I should use 136 threads because "In most cases it's best to specify no more than 64-68 MPI tasks or independent processes per node, and 1‑2 threads/core."
However, I feel this would be a waste of resources because, supposedly, one KNL node would have 272 (68 x 4) hardware threads.
Does gatk has special parameters I can use to make full use of such machine? some combination of -nct and -nt ?
Thanks very much in advance!
Hello
I am trying to generate a base recalibration plots using AnalyzeCovariate
My command is such
java -jar GenomeAnalysisTK.jar \
-T AnalyzeCovariates -R GRCh37-lite.fa \
-before test_data/realigned/SA495-Tumor.sorted.realigned.grp \
-after test_data/realigned/SA495-Tumor.sorted.post_recal.grp2 \
-plots recal_plots.pdf
and this gives me an error
INFO 17:01:06,050 HelpFormatter - Date/Time: 2014/05/16 17:01:06
INFO 17:01:06,050 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:01:06,050 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:01:06,962 GenomeAnalysisEngine - Strictness is SILENT
INFO 17:01:07,193 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 17:01:07,317 GenomeAnalysisEngine - Preparing for traversal
INFO 17:01:07,339 GenomeAnalysisEngine - Done preparing for traversal
INFO 17:01:07,340 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 17:01:07,340 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 17:01:08,293 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 17:01:08,537 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 17:01:08,592 AnalyzeCovariates - Generating csv file '/tmp/AnalyzeCovariates3565832248324656361.csv'
INFO 17:01:09,077 AnalyzeCovariates - Generating plots file 'recal_plots.pdf'
INFO 17:01:18,598 GATKRunReport - Uploaded run statistics report to AWS S3
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
org.broadinstitute.sting.utils.R.RScriptExecutorException: RScript exited with 1. Run with -l DEBUG for more info.
at org.broadinstitute.sting.utils.R.RScriptExecutor.exec(RScriptExecutor.java:174)
at org.broadinstitute.sting.utils.recalibration.RecalUtils.generatePlots(RecalUtils.java:548)
at org.broadinstitute.sting.gatk.walkers.bqsr.AnalyzeCovariates.generatePlots(AnalyzeCovariates.java:380)
at org.broadinstitute.sting.gatk.walkers.bqsr.AnalyzeCovariates.initialize(AnalyzeCovariates.java:394)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: RScript exited with 1. Run with -l DEBUG for more info.
ERROR ------------------------------------------------------------------------------------------
Ideas ?
Thanks
Hi everyone,
While running VariantEval on ch3 on of a gvcf file, using the following command:
java -Xmx32G -jar /home/d/GenomeAnalysisTK.jar -T VariantEval -R /home/d/Human_Reference/hg19ref.fa -eval Rise493.rescaled.g.vcf -D /home/d/Human_Reference/dbsnp_138.hg19.vcf -noEV -EV CompOverlap -EV IndelSummary -EV TiTvVariantEvaluator -EV CountVariants -EV MultiallelicSummary -L 3 -nt 8 -o SampleVariants_Rise493.eval.grp
I got the following error message:
**
I am using hg19 ref and the dbSNP file from your resource bundle. The msg leads me to believe that dbSNP and hg19 in your resource bundle use different chromosome naming conventions, chr1 vs 1, but I could be wrong.
I did not align the fastq myself to save time this time, and downloaded the bam from ENA. I checked to see what GATK tools have been used on this bam previously using this command:
/home/d/Samtools/samtools view -H Rise493.bam | grep '@PG'
@PG ID:MarkDuplicates PN:MarkDuplicates VN:1.110(1752) CL:net.sf.picard.sam.MarkDuplicates INPUT=[alignment/lib_RISE493_MA860_L1.flt.sort.bam] OUTPUT=alignment/lib_RISE493_MA860_L1.flt.sort.bam.rmdup.bam METRICS_FILE=alignment/lib_RISE493_MA860_L1.flt.sort.bam.metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true TMP_DIR=[/panvol1/simon/tmp] VALIDATION_STRINGENCY=LENIENT PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
@PG ID:GATK IndelRealigner VN:2.2-3-gde33222 CL:knownAlleles=[] targetIntervals=alignment/RISE493.hg19.flt.sort.rmdup.bam.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
Can you think of a quick workaround?
I want to run Mutect2 with the lasted cosmic file. But I can't find where to download it.
I search the Forum, and find others may advise to download cosmic file from "ftp://ngs.sanger.ac.uk/production/cosmic" . But I can't find anything in this website.
I wonder if you can help me to download the lasted cosmic vcf file. Thanks,
In GATK4, the GenotypeGVCFs
tool can only take a single input, so if you have GVCFs from multiple samples (which is usually the case) you will need to combine them before feeding them to GenotypeGVCFs
. Although there are several tools in the GATK and Picard toolkits that provide some type of VCF or GVCF merging functionality, for this use case there is only one valid way to do it: with GenomicsDBImport
.
The GenomicsDBImport
tool takes in one or more single-sample GVCFs and imports data over a single interval, and outputs a directory containing a GenomicsDB datastore with combined multi-sample data. GenotypeGVCFs
can then read from the created GenomicsDB directly and output a VCF.
Here are example commands to use it:
gatk-launch GenomicsDBImport \
-V data/gvcfs/mother.g.vcf \
-V data/gvcfs/father.g.vcf \
-V data/gvcfs/son.g.vcf \
--genomicsDBWorkspace my_database \
--intervals 20
That generates a directory called my_database
containing the combined gvcf data.
Then you run joint genotyping; note the gendb://
prefix to the database input directory path.
gatk-launch GenotypeGVCFs \
-R data/ref/ref.fasta \
-V gendb://my_database \
-G StandardAnnotation -newQual \
-O test_output.vcf
And that's all there is to it.
There are three caveats:
You can't add data to an existing database; you have to keep the original GVCFs around and reimport them all together when you get new samples. For very large numbers of samples, there are some batching options.
At the moment you can only run GenomicsDBImport
on a single genomic interval (ie max one contig). This will probably change because we'd like to enable running one more intervals in one go, but for now you need to run on each interval separately. We recommend scripting this of course.
At the moment GenomicsDB only supports diploid data. The developers of GenomicsDB are working on implementing support for non-diploid data.
If you want to generate a flat multisample GVCF file from the GenomicsDB you created, you can do so with SelectVariants as follows:
gatk-launch SelectVariants \
-R data/ref/ref.fasta \
-V gendb://my_database \
-O combined.g.vcf
Currently the GenomicsDB internal code uses the absolute path of the location of the database as part of the data encoding. As a consequence, you cannot move the database to a different location before running GenotypeGVCFs on it. If you do, it will no longer work. This is obviously not desirable, and the development team is looking at options to remediate this.
Hello,
When I genotype a large and diverse cohort I often see an error about too many alleles at one position, which is fine. What puzzles me is why the sites in question are not reported in order, e.g. here position 102k is highlighted after 501k:
INFO 20:56:15,037 ProgressMeter - 0101:9012801 0.0 30.0 s 50.1 w 2.4% 21.2 m 20.7 m
WARN 20:56:26,448 GenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location 0101:5019574
WARN 20:56:43,879 GenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location 0101:1023160
As far as I can tell the sites are ordered in the gVCFs. Is this an unimportant idiosyncrasy of the Progress Meter output, or does it point to a problem with my input files?
Many thanks,
Hi - I'm looking to run MuTect2 beta using the --germline_resource option. However,I cannot find af-only-gnomad.vcf for build h19.
How can I find this vcf for hg19?