Tools Documentation Pages are down

July 25, 2017, 8:31 am

≫ Next: DP field seems off after GenotypeGVCFs

Hi, It looks like all the pages for tools documentation are down right now (11:30am EST). See the attached image for the error message I get when I try to visit those pages.

~Daniel Ence

↧

DP field seems off after GenotypeGVCFs

July 25, 2017, 11:32 am

≫ Next: AddorReplaceReadGroup complain about missing Read Group

≪ Previous: Tools Documentation Pages are down

Hello,

I am merging a large number of gvcfs in batches and everything looks fine except the DP field. In some samples the DP field is not present in the gvcf files (see below) and therefore when merging all samples some of the samples have a . (DOT) in the DP field because it is not present in the gvcf file. Main problem is that when in later analysis filtering for DP than all the . (DOT) variants are removed... Is there a way to solve this problem without remaking all the gvcf files?

Thanks & Regards, Floris

gVCF file 1:

3   148896153   .   A   <NON_REF>   .   .   END=148896155   GT:DP:GQ:MIN_DP:PL  0/0:33:93:33:0,93,1395
3   148896156   .   A   <NON_REF>   .   .   END=148896159   GT:DP:GQ:MIN_DP:PL  0/0:36:99:35:0,99,1282
3   148896160   .   G   <NON_REF>   .   .   END=148896160   GT:DP:GQ:MIN_DP:PL  0/0:37:76:37:0,76,1445
3   148896161   .   A   <NON_REF>   .   .   END=148896395   GT:DP:GQ:MIN_DP:PL  0/0:81:99:37:0,107,1356
3   148896396   .   C   G,<NON_REF> 1171.77 .   BaseQRankSum=1.677;DP=61;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00;MQRankSum=0.719;ReadPosRankSum=0.849   GT:AD:GQ:PL:SB  0/1:27,34,0:99:1200,0,921,1282,1023,2305:4,23,1,33
3   148896397   .   C   <NON_REF>   .   .   END=148896444   GT:DP:GQ:MIN_DP:PL  0/0:52:99:40:0,105,1575
3   148896445   .   C   <NON_REF>   .   .   END=148896445   GT:DP:GQ:MIN_DP:PL  0/0:40:86:40:0,86,1515
3   148896446   .   C   <NON_REF>   .   .   END=148896447   GT:DP:GQ:MIN_DP:PL  0/0:38:99:38:0,99,1485
3   148896448   .   T   <NON_REF>   .   .   END=148896449   GT:DP:GQ:MIN_DP:PL  0/0:39:90:38:0,90,1350

gVCF file 2:

3   148895575   .   T   <NON_REF>   .   .   END=148895818   GT:DP:GQ:MIN_DP:PL  0/0:126:99:60:0,120,1800
3   148896150   .   A   <NON_REF>   .   .   END=148896150   GT:DP:GQ:MIN_DP:PL  0/0:33:96:33:0,96,1440
3   148896151   .   A   <NON_REF>   .   .   END=148896395   GT:DP:GQ:MIN_DP:PL  0/0:77:99:34:0,99,1485
3   148896396   rs139633388 C   G,<NON_REF> 1035.77 .   BaseQRankSum=0.571;ClippingRankSum=0.313;DB;DP=68;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00;MQRankSum=0.608;ReadPosRankSum=-0.006 GT:AD:DP:GQ:PL:SB   0/1:32,36,0:68:99:1064,0,933,1160,1042,2202:8,24,11,25
3   148896397   .   C   <NON_REF>   .   .   END=148896432   GT:DP:GQ:MIN_DP:PL  0/0:55:99:38:0,99,1485
3   148896433   .   A   <NON_REF>   .   .   END=148896434   GT:DP:GQ:MIN_DP:PL  0/0:38:96:38:0,96,1440
3   148896435   .   T   <NON_REF>   .   .   END=148896435   GT:DP:GQ:MIN_DP:PL  0/0:38:99:38:0,99,1485

↧

AddorReplaceReadGroup complain about missing Read Group

July 25, 2017, 11:53 am

≫ Next: Why do I get log info in my intervals file?

≪ Previous: DP field seems off after GenotypeGVCFs

I have incomplete readgroups in some bam-files, and tried to update them using the recomended script from the tutorial;
java -jar $PICARD_JAR AddOrReplaceReadGroups \
I= /work/users/ED-Sam.sorted.bam \
O= /work/users/ED-Sam_newRG.sorted.bam \
RGID=IDEDSAM \
RGLB=lib1 \
RGPL=illumina \
RGPU=BH3NTKALXX.8.3 \
RGSM=IDEDSAM \
SORT_ORDER=coordinate \
CREATE_INDEX=true

I get this error Message (have tested two different Version of Pickard (1.139 and 2.10.3), both gives the identical error:)
`
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line:
@RG ID:EDSAM PL:ILLUMINA; File /work/users/ED-Sam.sorted.bam; Line number 3270
at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:237)
at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:43)
at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:319)
at htsjdk.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:167)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:100)
at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:503)
at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:166)
at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:125)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:258)
at picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:94)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Suggestions?

↧

Why do I get log info in my intervals file?

July 25, 2017, 7:05 pm

≫ Next: what's the difference if do IndelRealigner first,then to bqsr ?

≪ Previous: AddorReplaceReadGroup complain about missing Read Group

SO I have been analysis my sequence data and try to do realignment.
After I run the cmd RealignerTargetCreator I got the interval files.
However the intervals file start with first line as:
There were no warn messages.
And when I use this interval files for realignment.
The ERROR message pop out saying:
Badly formed genome location: Contig 'There were no warn messages.' dose not match any contig.

Is there someting wrong with my process?

↧

what's the difference if do IndelRealigner first,then to bqsr ?

July 26, 2017, 2:46 am

≫ Next: Speeding up CatVariants to merge 1000s VCFs

≪ Previous: Why do I get log info in my intervals file?

HI,
I read the BP,found should do bqsr first ,then do IndelRealigner,
I do not know what that will affect ,any big effects to bam results appears ?

↧

Speeding up CatVariants to merge 1000s VCFs

July 26, 2017, 7:41 am

≫ Next: RealignerTargetCreator htsjdk.tribble.index.IndexFactory.loadIndex exception

≪ Previous: what's the difference if do IndelRealigner first,then to bqsr ?

Hi,

Following the advice seen elsewhere on the forum I have performed variant calling with whole-genome resequencing data on a per-scaffold basis. Now, I need to merge 30000 or so individual VCFs and I am using CatVariants for that using the following command:

java -Xmx5G -cp GenomeAnalysisTK-3.7-0/GenomeAnalysisTK.jar org.broadinstitute.gatk.tools.CatVariants -R $ref.fasta -out $out.vcf -assumeSorted -V $allvcfs.list

Unfortunately, it appears to be very slow (about only 5000 regions processed after >24h), so I am wondering if this is the expected behavior and if there is a way to increase the speed.

Thanks

↧

RealignerTargetCreator htsjdk.tribble.index.IndexFactory.loadIndex exception

July 26, 2017, 11:27 am

≫ Next: GENOTYPE_GIVEN_ALLELES mode not work in GATK4 beta

≪ Previous: Speeding up CatVariants to merge 1000s VCFs

I'm getting a tribble loadIndex exception from RealignerTargetCreator. I see this exception has been reported quite a bit, and one cause seems to be out-of-date index file. I deleted the .bai then recreated it with samtools index, and still got the error. Then I tried deleting it without recreating it, since apparently RealignerTargetCreator will create an index if there is none. But that still gave the error. Help!

INFO  11:13:55,502 HelpFormatter - --------------------------------------------------------------------------------
INFO  11:13:55,615 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO  11:13:55,615 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO  11:13:55,616 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO  11:13:55,616 HelpFormatter - [Wed Jul 26 11:13:55 PDT 2017] Executing on Linux 4.4.0-47-generic amd64
INFO  11:13:55,616 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_72-b15
INFO  11:13:55,621 HelpFormatter - Program Args: --analysis_type RealignerTargetCreator --reference_sequence /share/carvajal-archive/REFERENCE_DATA/genomes/GRCh38_decoy_LCCpanel/Homo_sapiens_assembly38_LCCpanel.fasta --intervals BED/3025671_Covered_hg38_decoyLCCpnl.pad200.bed --input_file DATA/CH-59/N2/CH-59N2.dedup.bam --out DATA/CH-59/N2/CH-59N2.realign.interval_list --validation_strictness STRICT -known /share/carvajal-archive/REFERENCE_DATA/GATK_Bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf
INFO  11:13:55,626 HelpFormatter - Executing as twtoal@carcinos on Linux 4.4.0-47-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_72-b15.
INFO  11:13:55,626 HelpFormatter - Date/Time: 2017/07/26 11:13:55
INFO  11:13:55,627 HelpFormatter - --------------------------------------------------------------------------------
INFO  11:13:55,627 HelpFormatter - --------------------------------------------------------------------------------
INFO  11:13:55,634 GenomeAnalysisEngine - Strictness is STRICT
INFO  11:13:59,404 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  11:13:59,411 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  11:14:00,008 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.60
##### ERROR --
##### ERROR stack trace
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:173)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadFromDisk(RMDTrackBuilder.java:375)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.attemptToLockAndLoadIndexFromDisk(RMDTrackBuilder.java:359)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:319)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:264)
        at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:153)
        at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:208)
        at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:88)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1052)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:829)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:287)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:169)
        ... 14 more
Caused by: java.io.EOFException
        at htsjdk.tribble.util.LittleEndianInputStream.readFully(LittleEndianInputStream.java:138)
        at htsjdk.tribble.util.LittleEndianInputStream.readLong(LittleEndianInputStream.java:80)
        at htsjdk.tribble.index.linear.LinearIndex$ChrIndex.read(LinearIndex.java:271)
        at htsjdk.tribble.index.AbstractIndex.read(AbstractIndex.java:367)
        at htsjdk.tribble.index.linear.LinearIndex.<init>(LinearIndex.java:101)
        ... 19 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.reflect.InvocationTargetException
##### ERROR ------------------------------------------------------------------------------------------

↧

GENOTYPE_GIVEN_ALLELES mode not work in GATK4 beta

July 26, 2017, 8:00 pm

≫ Next: How can Mutect2 GATK 4 beta be parallelized?

≪ Previous: RealignerTargetCreator htsjdk.tribble.index.IndexFactory.loadIndex exception

Hi, guys, I tried to run GGA(GENOTYPE_GIVEN_ALLELES) of GATK4.beta.1, but failed with NullPointerException, I'm sure that my input file and parameter settings are OK, cause I have checked my setting with this post of our forum and 4.beta.1's docs, also the equivalent parameters work fine for GATK3.7.

I haven't tested GGA with 4.beta.3 and 4.beta.2, as the release notes shows that there is no update related to this function. I'm wondering if GGA can function well for 4.beta or the future general release or maybe I need to change my parameters to get it running up? Below is my parameters and error log.

gatk-launch --javaOptions "-Xmx4g" HaplotypeCaller  \
   -R /reference/BWAIndex/genome.fa \
   -I  miseq_161113_PE75.bwa.sorted.filtered.recal.bam \
   -O miseq_161113_PE75_gatk4_pgkb.vcf \
   -L  /path/to/my.vcf.gz \
   --alleles /path/to/my.vcf.gz \
   --genotyping_mode GENOTYPE_GIVEN_ALLELES

[July 14, 2017 2:51:14 PM CST] Executing as jiecui@Neptune on Linux 4.4.0-83-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11; Version: 4.beta.1
14:51:14.936 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
14:51:14.936 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:51:14.936 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:51:14.936 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:51:14.936 INFO  HaplotypeCaller - Deflater: IntelDeflater
14:51:14.936 INFO  HaplotypeCaller - Inflater: IntelInflater
14:51:14.936 INFO  HaplotypeCaller - Initializing engine
......
14:51:15.342 INFO  IntervalArgumentCollection - Processing 43 bp from intervals
14:51:15.350 INFO  HaplotypeCaller - Done initializing engine
14:51:15.356 INFO  HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
14:51:15.594 WARN  PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
14:51:15.737 WARN  PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
14:51:15.964 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/media/home/jiecui/software/gatk/gatk-4.beta.1/gatk-package-4.beta.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
[INFO] Available threads: 40
[INFO] Requested threads: 4
[INFO] Using 4 threads
14:51:16.035 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
14:51:16.051 INFO  ProgressMeter - Starting traversal
14:51:16.051 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
log4j:WARN No appenders could be found for logger (org.broadinstitute.hellbender.utils.MathUtils$Log10Cache).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14:51:16.424 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.001359444
14:51:16.424 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.004590645
14:51:16.424 INFO  HaplotypeCaller - Shutting down engine
[July 14, 2017 2:51:16 PM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1598029824
java.lang.NullPointerException
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerGenotypingEngine.createAlleleMapper(AssemblyBasedCallerGenotypingEngine.java:159)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:128)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:221)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:244)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:217)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:838)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
        at org.broadinstitute.hellbender.Main.main(Main.java:230)

Java and GATK version:

Java version: openjdk version "1.8.0_131"
GATK version: 4.beta.1

↧

How can Mutect2 GATK 4 beta be parallelized?

June 28, 2017, 9:57 pm

≫ Next: Please help me to interpret this line. How come I have this disease?

≪ Previous: GENOTYPE_GIVEN_ALLELES mode not work in GATK4 beta

Hello,

I am creating PoN for Mutect2 and following an instruction in the comments of Mutect2.java

gatk-launch --javaOptions "-Xmx4g" Mutect2 \
-R ref_fasta.fa \
-I normal1.bam \
-tumor normal1_sample_name \
--germline_resource af-only-gnomad.vcf.gz \
-L intervals.list \
-O normal1_for_pon.vcf.gz

This task seems to take highly variable time per sample or interval. I also realized that Mutect2 is not a Spark tool in GATK 4.
Is splitting intervals (and maybe --nativePairHmmThreads) only way to parallelize this task?
I wonder if you have any advice on parallelization in running Mutect2.

Thank you!

↧

Please help me to interpret this line. How come I have this disease?

July 26, 2017, 10:52 pm

≫ Next: --dontUseSoftClippedBases In HaplotypeCaller

≪ Previous: How can Mutect2 GATK 4 beta be parallelized?

chr1 53676448 . G A 1495.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.48;ClippingRankSum=-5.270e-01;DP=79;ExcessHet=3.0103;FS=4.485;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.775;QD=19.18;ReadPosRankSum=0.403;SOR=0.581;set=variant2 GT:AD:DP:GQ:PL:CGIANN_VARNAME:CGIANN_1000GAF:CGIANN_ESP6500AF 0/1:29,49:78:99:1524,0,780:-,NM_000098.2(CPT2) c.1102G>A (p.V368I):-,0.5:-,0.456405
chr1 53676986 . C . . . END=53678942;NT GT ./.
chr1 53679264 . T . . . END=53680317;NT GT ./.
chr1 53680529 . a . . . END=53681541;NT GT ./.
chr1 53681771 . G . . . END=53682332;NT GT ./.
chr1 53682540 . G . . . END=53683699;NT GT ./.

I only saw one mutation with quite some information, and several other lines without information. How come I have a cpt-2 deficiency?

↧

--dontUseSoftClippedBases In HaplotypeCaller

June 18, 2015, 2:36 pm

≫ Next: What does "the region in not targeted by the enrichment chip" mean?

≪ Previous: Please help me to interpret this line. How come I have this disease?

Hi GATK team,

Does using --dontUseSoftClippedBases option in HC affect SNP/indel calling in anyway? Can you please elaborate a little bit on how enabling this parameter affects variant calling?

Thanks,
Teja.

↧

What does "the region in not targeted by the enrichment chip" mean?

July 27, 2017, 5:26 am

≫ Next: CollectVariantCallingMetrics varies depending on version of b37 DbSNP

≪ Previous: --dontUseSoftClippedBases In HaplotypeCaller

does it mean this region is not sequenced?

chr1 53668182 . T . . NT END=53675682;NT GT ./.
chr1 53675849 . C . . NT END=53675853;NT GT ./.

↧

CollectVariantCallingMetrics varies depending on version of b37 DbSNP

July 27, 2017, 6:54 am

≫ Next: running gatk snp call in KNL computing node (Intel Xeon Phi 7250)

≪ Previous: What does "the region in not targeted by the enrichment chip" mean?

Please review the questions I posted at https://gatkforums.broadinstitute.org/gatk/discussion/10041/varianteval-error-message#latest.

They appear as "Answered" but are in fact not.

Thanks.

↧

running gatk snp call in KNL computing node (Intel Xeon Phi 7250)

June 21, 2017, 10:14 am

≫ Next: AnalyzeCovariates error (R)

≪ Previous: CollectVariantCallingMetrics varies depending on version of b37 DbSNP

Hi there,

I would like to run gatk SNP call in TACC stampede2 machine (https://portal.tacc.utexas.edu/user-guides/stampede2) and they have the KNL node equipped with Intel Xeon Phi 7250. This CPU is kinda special because "Stampede2's KNL nodes have 68 cores, each with 4 hardware threads".

Right now, the command I used is:
java -Djava.io.tmpdir=/tmp -jar $TACC_GATK_DIR/GenomeAnalysisTK.jar -nct 136 -R assembly_selfref_v2.fa -T UnifiedGenotyper ......

Basically, TACC support people told me that I should use 136 threads because "In most cases it's best to specify no more than 64-68 MPI tasks or independent processes per node, and 1‑2 threads/core."

However, I feel this would be a waste of resources because, supposedly, one KNL node would have 272 (68 x 4) hardware threads.

Does gatk has special parameters I can use to make full use of such machine? some combination of -nct and -nt ?

Thanks very much in advance!

↧

AnalyzeCovariates error (R)

May 16, 2014, 5:05 pm

≫ Next: VariantEval error message

≪ Previous: running gatk snp call in KNL computing node (Intel Xeon Phi 7250)

Hello

I am trying to generate a base recalibration plots using AnalyzeCovariate

My command is such

java -jar GenomeAnalysisTK.jar \
-T AnalyzeCovariates -R GRCh37-lite.fa \
-before test_data/realigned/SA495-Tumor.sorted.realigned.grp \
-after test_data/realigned/SA495-Tumor.sorted.post_recal.grp2 \
-plots recal_plots.pdf

and this gives me an error

INFO  17:01:06,050 HelpFormatter - Date/Time: 2014/05/16 17:01:06
INFO  17:01:06,050 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:01:06,050 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:01:06,962 GenomeAnalysisEngine - Strictness is SILENT
INFO  17:01:07,193 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  17:01:07,317 GenomeAnalysisEngine - Preparing for traversal
INFO  17:01:07,339 GenomeAnalysisEngine - Done preparing for traversal
INFO  17:01:07,340 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  17:01:07,340 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining
INFO  17:01:08,293 ContextCovariate -       Context sizes: base substitution model 2, indel substitution model 3
INFO  17:01:08,537 ContextCovariate -       Context sizes: base substitution model 2, indel substitution model 3
INFO  17:01:08,592 AnalyzeCovariates - Generating csv file '/tmp/AnalyzeCovariates3565832248324656361.csv'
INFO  17:01:09,077 AnalyzeCovariates - Generating plots file 'recal_plots.pdf'
INFO  17:01:18,598 GATKRunReport - Uploaded run statistics report to AWS S3
 ERROR ------------------------------------------------------------------------------------------
 ERROR stack trace
org.broadinstitute.sting.utils.R.RScriptExecutorException: RScript exited with 1. Run with -l DEBUG for more info.
    at org.broadinstitute.sting.utils.R.RScriptExecutor.exec(RScriptExecutor.java:174)
    at org.broadinstitute.sting.utils.recalibration.RecalUtils.generatePlots(RecalUtils.java:548)
    at org.broadinstitute.sting.gatk.walkers.bqsr.AnalyzeCovariates.generatePlots(AnalyzeCovariates.java:380)
    at org.broadinstitute.sting.gatk.walkers.bqsr.AnalyzeCovariates.initialize(AnalyzeCovariates.java:394)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)
 ERROR ------------------------------------------------------------------------------------------
 ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
 ERROR
 ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
 ERROR If not, please post the error message, with stack trace, to the GATK forum.
 ERROR Visit our website and forum for extensive documentation and answers to
 ERROR commonly asked questions http://www.broadinstitute.org/gatk
 ERROR
 ERROR MESSAGE: RScript exited with 1. Run with -l DEBUG for more info.
 ERROR ------------------------------------------------------------------------------------------

Ideas ?
Thanks

↧

VariantEval error message

July 26, 2017, 5:52 am

≫ Next: how to download the lasted cosmic vcf file

≪ Previous: AnalyzeCovariates error (R)

Hi everyone,

While running VariantEval on ch3 on of a gvcf file, using the following command:

java -Xmx32G -jar /home/d/GenomeAnalysisTK.jar -T VariantEval -R /home/d/Human_Reference/hg19ref.fa -eval Rise493.rescaled.g.vcf -D /home/d/Human_Reference/dbsnp_138.hg19.vcf -noEV -EV CompOverlap -EV IndelSummary -EV TiTvVariantEvaluator -EV CountVariants -EV MultiallelicSummary -L 3 -nt 8 -o SampleVariants_Rise493.eval.grp

I got the following error message:
**

ERROR MESSAGE: Input files dbsnp and reference have incompatible contigs. Please see https://www.broadinstitute.org/gatk/guide/article?id=63for more information. Error details: No overlapping contigs found.**

ERROR dbsnp contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]

ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1]

I am using hg19 ref and the dbSNP file from your resource bundle. The msg leads me to believe that dbSNP and hg19 in your resource bundle use different chromosome naming conventions, chr1 vs 1, but I could be wrong.

I did not align the fastq myself to save time this time, and downloaded the bam from ENA. I checked to see what GATK tools have been used on this bam previously using this command:

/home/d/Samtools/samtools view -H Rise493.bam | grep '@PG'

@PG ID:MarkDuplicates PN:MarkDuplicates VN:1.110(1752) CL:net.sf.picard.sam.MarkDuplicates INPUT=[alignment/lib_RISE493_MA860_L1.flt.sort.bam] OUTPUT=alignment/lib_RISE493_MA860_L1.flt.sort.bam.rmdup.bam METRICS_FILE=alignment/lib_RISE493_MA860_L1.flt.sort.bam.metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true TMP_DIR=[/panvol1/simon/tmp] VALIDATION_STRINGENCY=LENIENT PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
@PG ID:GATK IndelRealigner VN:2.2-3-gde33222 CL:knownAlleles=[] targetIntervals=alignment/RISE493.hg19.flt.sort.rmdup.bam.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null

Can you think of a quick workaround?

↧

how to download the lasted cosmic vcf file

August 9, 2017, 7:26 pm

≫ Next: Using GenomicsDBImport to prepare GVCFs for input to GenotypeGVCFs in GATK4

≪ Previous: VariantEval error message

I want to run Mutect2 with the lasted cosmic file. But I can't find where to download it.
I search the Forum, and find others may advise to download cosmic file from "ftp://ngs.sanger.ac.uk/production/cosmic" . But I can't find anything in this website.
I wonder if you can help me to download the lasted cosmic vcf file. Thanks,

↧

Using GenomicsDBImport to prepare GVCFs for input to GenotypeGVCFs in GATK4

July 28, 2017, 6:09 pm

≫ Next: Order of reporting sites by GenotypeGVCFS

≪ Previous: how to download the lasted cosmic vcf file

In GATK4, the GenotypeGVCFs tool can only take a single input, so if you have GVCFs from multiple samples (which is usually the case) you will need to combine them before feeding them to GenotypeGVCFs. Although there are several tools in the GATK and Picard toolkits that provide some type of VCF or GVCF merging functionality, for this use case there is only one valid way to do it: with GenomicsDBImport.

The GenomicsDBImport tool takes in one or more single-sample GVCFs and imports data over a single interval, and outputs a directory containing a GenomicsDB datastore with combined multi-sample data. GenotypeGVCFs can then read from the created GenomicsDB directly and output a VCF.

Here are example commands to use it:

gatk-launch GenomicsDBImport \
    -V data/gvcfs/mother.g.vcf \
    -V data/gvcfs/father.g.vcf \
    -V data/gvcfs/son.g.vcf \
    --genomicsDBWorkspace my_database \
    --intervals 20

That generates a directory called my_database containing the combined gvcf data.

Then you run joint genotyping; note the gendb:// prefix to the database input directory path.

gatk-launch GenotypeGVCFs \
    -R data/ref/ref.fasta \
    -V gendb://my_database \
    -G StandardAnnotation -newQual \
    -O test_output.vcf

And that's all there is to it.

There are three caveats:

You can't add data to an existing database; you have to keep the original GVCFs around and reimport them all together when you get new samples. For very large numbers of samples, there are some batching options.
At the moment you can only run GenomicsDBImport on a single genomic interval (ie max one contig). This will probably change because we'd like to enable running one more intervals in one go, but for now you need to run on each interval separately. We recommend scripting this of course.
At the moment GenomicsDB only supports diploid data. The developers of GenomicsDB are working on implementing support for non-diploid data.

Addendum: extracting data from the GenomicsDB

If you want to generate a flat multisample GVCF file from the GenomicsDB you created, you can do so with SelectVariants as follows:

gatk-launch SelectVariants \
    -R data/ref/ref.fasta \
    -V gendb://my_database \
    -O combined.g.vcf

Caveat: cannot move database after creation

Currently the GenomicsDB internal code uses the absolute path of the location of the database as part of the data encoding. As a consequence, you cannot move the database to a different location before running GenotypeGVCFs on it. If you do, it will no longer work. This is obviously not desirable, and the development team is looking at options to remediate this.

↧

Order of reporting sites by GenotypeGVCFS

August 9, 2017, 7:01 pm

≫ Next: MuTect2 beta --germline_resource for build h19，af-only-gnomad.vcf

≪ Previous: Using GenomicsDBImport to prepare GVCFs for input to GenotypeGVCFs in GATK4

Hello,

When I genotype a large and diverse cohort I often see an error about too many alleles at one position, which is fine. What puzzles me is why the sites in question are not reported in order, e.g. here position 102k is highlighted after 501k:

INFO 20:56:15,037 ProgressMeter - 0101:9012801 0.0 30.0 s 50.1 w 2.4% 21.2 m 20.7 m
WARN 20:56:26,448 GenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location 0101:5019574
WARN 20:56:43,879 GenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location 0101:1023160

As far as I can tell the sites are ordered in the gVCFs. Is this an unimportant idiosyncrasy of the Progress Meter output, or does it point to a problem with my input files?

Many thanks,

↧

MuTect2 beta --germline_resource for build h19，af-only-gnomad.vcf

August 9, 2017, 7:46 pm

≫ Next: gatk4.beta.2 :IndexFeatureFile problem

≪ Previous: Order of reporting sites by GenotypeGVCFS

Hi - I'm looking to run MuTect2 beta using the --germline_resource option. However,I cannot find af-only-gnomad.vcf for build h19.
How can I find this vcf for hg19?

↧