I was in the middle of working on something and had an error about missing commands but I can't check because the website was down! I have a deadline and so was wondering if you know how long it will be down for?
Thanks.
I was in the middle of working on something and had an error about missing commands but I can't check because the website was down! I have a deadline and so was wondering if you know how long it will be down for?
Thanks.
When running GATK 4.0.0.0, (in this case using Apply BQSR) the notice
11:36:10.430 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 1
appears. A bit of digging led me to the Python code in the newly distributed gatk
program. There, there are two variables that set -Dsamjdk.compression_level=1
by default. I changed the level here to 5, but the output from ApplyBQSR remained the same, and from the file sizes i'm seeing (though I may be wrong), it seems that the compression level is not at 5.
Thoughts?
Dear Mr/Ms.,
I used GenomicsDBImport and GenotypeGVCFs to call SNPs in GATK4. Due to large reference genome (1G) and sample size (100). I want to separate the work for each chromosome. It seems that GenomicsDBImport works for all chromosomes, but GenotypeGVCFs only works for chromosome1. Could you please give me some suggestions. Below are commands and log information for chromosome 2.
Look forward to hearing from you soon.
Best regards,
Baosheng
$GATK --java-options "-Xmx24g" \
GenomicsDBImport \
${InputVCF} \
--genomicsdb-workspace-path ${OUTDIR}/chr02 \
-L Qrob_Chr02
$GATK --java-options "-Xmx48g" \
GenotypeGVCFs \
-R ${REF} \
-V gendb://${OUTDIR}/chr02 \
-all-sites \
-O ${OUTDIR}/chr02.vcf
23:41:40.274 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/WangBS/software/GATK/gatk/build/libs/gatk-package-4.0.11.0-56-g2c0e9b0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
23:42:41.990 INFO GenomicsDBImport - ------------------------------------------------------------
23:42:41.990 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.11.0-56-g2c0e9b0-SNAPSHOT
23:42:41.990 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
23:42:41.991 INFO GenomicsDBImport - Executing as WangBS@cu53 on Linux v3.10.0-693.el7.x86_64 amd64
23:42:41.991 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-b12
23:42:41.991 INFO GenomicsDBImport - Start Date/Time: January 26, 2019 11:41:40 PM CST
23:42:41.991 INFO GenomicsDBImport - ------------------------------------------------------------
23:42:41.991 INFO GenomicsDBImport - ------------------------------------------------------------
23:42:41.991 INFO GenomicsDBImport - HTSJDK Version: 2.18.1
23:42:41.991 INFO GenomicsDBImport - Picard Version: 2.18.16
23:42:41.992 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
23:42:41.992 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
23:42:41.992 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
23:42:41.992 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
23:42:41.992 INFO GenomicsDBImport - Deflater: IntelDeflater
23:42:41.992 INFO GenomicsDBImport - Inflater: IntelInflater
23:42:41.992 INFO GenomicsDBImport - GCS max retries/reopens: 20
23:42:41.992 INFO GenomicsDBImport - Requester pays: disabled
23:42:41.992 INFO GenomicsDBImport - Initializing engine
23:42:44.099 INFO IntervalArgumentCollection - Processing 115639695 bp from intervals
23:42:44.102 INFO GenomicsDBImport - Done initializing engine
23:42:44.276 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/WangBS/Analyses/vcf/test/chr02/vidmap.json
23:42:44.276 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/WangBS/Analyses/vcf/test/chr02/callset.json
23:42:44.276 INFO GenomicsDBImport - Complete VCF Header will be written to /home/WangBS/Analyses/vcf/test/chr02/vcfheader.vcf
23:42:44.276 INFO GenomicsDBImport - Importing to array - /home/WangBS/Analyses/vcf/test/chr02/genomicsdb_array
23:42:44.276 INFO ProgressMeter - Starting traversal
23:42:44.276 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
23:42:45.830 INFO GenomicsDBImport - Importing batch 1 with 63 samples
Buffer resized from 37294 bytes to 65464
Buffer resized from 37294 bytes to 65511
Buffer resized from 37293 bytes to 65539
Buffer resized from 37294 bytes to 65447
.....
.....
Buffer resized from 65538 bytes to 65539
Buffer resized from 65538 bytes to 65539
Buffer resized from 65538 bytes to 65539
06:50:14.219 INFO ProgressMeter - Qrob_Chr02:1 427.5 1 0.0
06:50:14.220 INFO GenomicsDBImport - Done importing batch 1/1
06:50:14.221 INFO ProgressMeter - Qrob_Chr02:1 427.5 1 0.0
06:50:14.229 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 427.5 minutes.
06:50:14.236 INFO GenomicsDBImport - Import completed!
06:50:14.236 INFO GenomicsDBImport - Shutting down engine
[January 27, 2019 6:50:14 AM CST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 428.57 minutes.
Runtime.totalMemory()=8988393472
Tool returned:
true
Using GATK jar /home/WangBS/software/GATK/gatk/build/libs/gatk-package-4.0.11.0-56-g2c0e9b0-SNAPSHOT-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx24g -jar /home/WangBS/software/GATK/gatk/build/libs/gatk-package-4.0.11.0-56-g2c0e9b0-SNAPSHOT-local.jar GenotypeGVCFs -R /home/WangBS/Reference/Qrobur/Qrob_PM1N.fa -V gendb:///home/WangBS/Analyses/vcf/test/chr02 -all-sites -O /home/WangBS/Analyses/vcf/test/chr02.vcf
06:50:19.236 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/WangBS/software/GATK/gatk/build/libs/gatk-package-4.0.11.0-56-g2c0e9b0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
06:51:21.116 INFO GenotypeGVCFs - ------------------------------------------------------------
06:51:21.116 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.11.0-56-g2c0e9b0-SNAPSHOT
06:51:21.116 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
06:51:21.117 INFO GenotypeGVCFs - Executing as WangBS@cu53 on Linux v3.10.0-693.el7.x86_64 amd64
06:51:21.117 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-b12
06:51:21.117 INFO GenotypeGVCFs - Start Date/Time: January 27, 2019 6:50:19 AM CST
06:51:21.117 INFO GenotypeGVCFs - ------------------------------------------------------------
06:51:21.117 INFO GenotypeGVCFs - ------------------------------------------------------------
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Version: 2.18.1
06:51:21.118 INFO GenotypeGVCFs - Picard Version: 2.18.16
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
06:51:21.118 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
06:51:21.118 INFO GenotypeGVCFs - Deflater: IntelDeflater
06:51:21.118 INFO GenotypeGVCFs - Inflater: IntelInflater
06:51:21.118 INFO GenotypeGVCFs - GCS max retries/reopens: 20
06:51:21.118 INFO GenotypeGVCFs - Requester pays: disabled
06:51:21.118 INFO GenotypeGVCFs - Initializing engine
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
06:51:26.212 INFO GenotypeGVCFs - Done initializing engine
06:51:26.257 INFO ProgressMeter - Starting traversal
06:51:26.257 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
06:51:26.278 INFO GenotypeGVCFs - Shutting down engine
[January 27, 2019 6:51:26 AM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 1.12 minutes.
Runtime.totalMemory()=1972371456
java.lang.IllegalStateException: There are no sources based on those query parameters
at com.intel.genomicsdb.reader.GenomicsDBFeatureIterator.(GenomicsDBFeatureIterator.java:131)
at com.intel.genomicsdb.reader.GenomicsDBFeatureReader.query(GenomicsDBFeatureReader.java:144)
at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:534)
at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:503)
at org.broadinstitute.hellbender.engine.FeatureDataSource.query(FeatureDataSource.java:469)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$2(VariantLocusWalker.java:144)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:143)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Hi,
We ran a CombineGVCFs
job using the following command, where gvcfs.list
contained only 31 gvcf files with 24 samples each:
$GATK --java-options "-Xmx650G" \
CombineGVCFs \
-R $referenceFasta \
-O full_cohort.b37.g.vcf \
--variant gvcfs.list
We tried the extreme memory because CombineGVCFs
kept failing. This node has 750G of RAM.
Despite the high memory provided, we get the stacktrace below. The total memory reported by GATK
is only ~12G, though (Runtime.totalMemory()=12662603776
). Am I missing something? I don't understand why GATK
is only using 12G of RAM when we provided much more, and then failing with an OutOfMemoryError
.
We are currently setting up GenomicsDBImport
, but this seems worth reporting.
Really appreciate your help.
18:55:51.944 INFO ProgressMeter - 4:26649295 23.6 18617000 787894.4
18:56:01.988 INFO ProgressMeter - 4:26655758 23.8 18779000 789159.6
18:59:13.407 INFO CombineGVCFs - Shutting down engine
[October 19, 2018 6:59:13 PM CDT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 27.06 minutes.
Runtime.totalMemory()=12662603776
Exception in thread "main" java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:316)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at java.io.BufferedWriter.close(BufferedWriter.java:266)
at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:226)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.closeTool(CombineGVCFs.java:461)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:970)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
I fisrt extract all mapped reads
samtools view -h -F 4 xx.hlamap.sam > xx.mapped.sam
then convert it
java -jar picard.jar SamToFastq I=xx.mapped.sam F=xx.hlatmp.1.fastq F2=xx.hlatmp.2.fastq
but it has errors like following, how should I revise it, thanks a lot
Hi,
I'm trying to build a pon with GATK 3.7-0 to use with mutect2. For that, I've downloaded 80 exome bam files from the 1000g project (GBR, TSI, IBS and CEU populations).
For most of them, when I try to use the artifact_dectection_mode, I get a GATK runtime error saying 'READ_MAX_LENGTH must be > 0 but got 0'.
To try you can, for example, download ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00116/exome_alignment/HG00116.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam
I'm using the b37 reference files from the bundle_gatk_2_8 and the bed file of SureSelect6 from Agilent .
Here the full stack trace and command line :
INFO 09:42:32,040 HelpFormatter - ------------------------------------------------------------------------------------
INFO 09:42:32,045 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 09:42:32,045 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 09:42:32,045 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 09:42:32,045 HelpFormatter - [Thu Apr 06 09:42:32 GMT 2017] Executing on Linux 2.6.18-275.12.1.el5.573g0000 amd64
INFO 09:42:32,045 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-b25
INFO 09:42:32,049 HelpFormatter - Program Args: -T MuTect2 -I:tumor /data/misc/mutect2/pon/1000g_bam/HG00116.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam --db
snp /data/highlander/reference/bundle_gatk_2_8/b37/dbsnp_138.b37.vcf --artifact_detection_mode -L /data/highlander/reference/bundle_gatk_2_8/b37/capture.sureselect6.bed -R /data/highlander/reference/bundle_gatk_2_8/b37/human_g1k_v37.fasta -o /data/misc/mutect2/pon/1000g_vcf_normal/HG00116.vcf.gz
INFO 09:42:32,062 HelpFormatter - Executing as lifescope@n3 on Linux 2.6.18-275.12.1.el5.573g0000 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-b25.
INFO 09:42:32,062 HelpFormatter - Date/Time: 2017/04/06 09:42:32
INFO 09:42:32,062 HelpFormatter - ------------------------------------------------------------------------------------
INFO 09:42:32,063 HelpFormatter - ------------------------------------------------------------------------------------
...
java.lang.IllegalArgumentException: READ_MAX_LENGTH must be > 0 but got 0
at org.broadinstitute.gatk.utils.pairhmm.PairHMM.initialize(PairHMM.java:126)
at org.broadinstitute.gatk.utils.pairhmm.N2MemoryPairHMM.initialize(N2MemoryPairHMM.java:60)
at org.broadinstitute.gatk.utils.pairhmm.LoglessPairHMM.initialize(LoglessPairHMM.java:66)
at org.broadinstitute.gatk.utils.pairhmm.PairHMM.initialize(PairHMM.java:159)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.initializePairHMM(PairHMMLikelihoodCalculationEngine.java:267)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngine.java:282)
at org.broadinstitute.gatk.tools.walkers.cancer.m2.MuTect2.map(MuTect2.java:644)
at org.broadinstitute.gatk.tools.walkers.cancer.m2.MuTect2.map(MuTect2.java:171)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
Am I doing something wrong or is it some kind of bug :-) ?
Thank you in advance for your help
Raphael
Hi,
We have re-analyzed a TCGA WES sample by taking the BAM file and using RevertSam and then putting it through the standard pipeline. We have noticed about a 1% difference in variants when doing it by read-group (i.e. if we produce a uBAM per read-group and then merge then at the point of MarkDuplicates) than when we do it without read-group.
Separating by read-group is a bit of a nuisance for our pipeline and we wanted to know if it is correct not to do so. I take it all the read-groups have followed the same sequencing protocols. I imagine this may have to do with BWA being read-group aware.
Could you please clarify
Thanks a lot for a terrific tool.
I was wondering how GATK VQSR deals with multi-allelic sites.
I already know that -
i) VQSR treats them same way as bi-allelic sites (https://gatkforums.broadinstitute.org/gatk/discussion/7754/how-vqsr-deals-with-multiallelic-snps-and-indel)
ii) Split multi-allelic sites before VQSR (https://gatkforums.broadinstitute.org/gatk/discussion/23559/split-multiallelic-variants-before-vqsr-and-cnnscorevariants-gatk-team-opinion).
This mainly informs about mixed (SNP + INDEL) multi-allelic sites.
Summary questions:
1. Do you recommend split multi-allelic SNPs before VQSR? Will it be biased since site-level information/annotation would be multiple counted. I got different results in split and NOT split (performed relatively better)
2. If we don't split multi-allelic SNP sites then how Ti/Tv ratio is calculated.
For example:
chr1 123 A T,G
chr2 234 C *,A,T
In these above cases, which allele(s) is taken to calculate the Ti/Tv ratio in the tranche file. If VQSR takes the first allele then what to expect in 2nd case, where a star allele is at first position Or it is better to remove star alleles before VQSR?
To whom it may concern,
I have both normal and tumour sample and I also have the parental data (both mother and father) for the patient sample. I hope to first haplotype phase the SNP and INDELs from the haplotype caller using PhaseByTransmission. Thereafter, I wanted to haplotype phase the somatic mutations from MuTect2 using Read-Backed Phasing.
I wanted to ask whether the Read-Backed Phasing method will consider both the SNP and INDEL encompassed within the read and whether it will also consider the information from PhaseByTransmission when haplotype phasing the somatic mutations.
Regards,
Sangjin Lee
Hi Team,
I'm getting `WARN 21:19:30,478 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation` when processing gzipped g.vcf files produced by HaplotypeCaller (via -o foo.g.vcf.gz, as suggested by @Geraldine_VdAuwera in blog post 3893) with GenotypeGVCFs.
This results in dramatic increases in run time (makes sense if GenotypeGVCFs un-compresses the files), and memory requirements (why ??) for GenotypeGVCFs compared to processing the gvcf for same bam files if HC outfiles are unzipped. Most batches that previously completed with 4x8GB RAM now produce `java.lang.OutOfMemoryError: Java heap space` errors even with 4X64GB!
Could you please advise whether this warning is expected behaviour? If yes, what exactly is missing (can't see much difference in unzipped vs gzipped vcf headers), and can this be added somehow?
Hi, I asked this question a while ago and a few times. I know, there is a wonderful WDL platform and fire cloud stuff to run things in parallel and check this and that. But, for someone who are so used to a series of simple BASH commands, can you guys please kindly provide an example script like the one shown here https://gencore.bio.nyu.edu/variant-calling-pipeline/?
Right after I found the above, I found that it is not updated with GATK4, and I would hate to use a pipeline that is based on an outdated version of engine.
I will say "Thank You So Much, GATK". For almost a year, I still could not make GATK run on my own server, although there are a million documentation and tutorial and PPTs googled everywhere.
How can i handle it?
Error message :
A USER ERROR has occurred: The argument: "resource/resource" does not accept tags: "hapmap,known=false,training=true,truth=true,prior=15.0"
Command
java -Xmx60g -jar /UUU/chul/wes/tools/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar VariantRecalibrator -R /UUU/chul/wes/hg19/ucsc.hg19.fasta -input new.vcf -input new.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /UUU/chul/wes/hg19/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /UUU/chul/wes/hg19/1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /UUU/chul/wes/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /UUU/chul/wes/hg19/dbsnp_138.hg19.vcf -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -O t4.recalibrate_SNP.recal --tranches-file t4.recalibrate_SNP.tranches --rscript-file t4.recalibrate_SNP_plots.R
Could someone please provide me with a help to run Variant Recalibrator for GATK4.0.11.0?
when running the tool using GATK 4.0.11.0 with the following command line:
time ~/gatk-4.0.11.0/gatk VariantRecalibrator
-R ~/reference/hg19.fa -V ~/MT-1/outname.HC.vcf.gz
--resource hapmap,known=false,training=true,truth=true,prior=15.0:~/reference/hg19/hapmap_3.3.hg19.sites.vcf
--resource omni,known=false,training=true,truth=false,prior=12.0:~/reference/hg19/1000G_omni2.5.hg19.sites.vcf
--resource 1000G,known=false,training=true,truth=false,prior=10.0:~/reference/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf
--resource dbsnp,known=true,training=false,truth=false,prior=6.0:~/reference/hg19/dbsnp_138.hg19.vcf
--use-annotation DP --use-annotation QD --use-annotation FS --use-annotation SOR --use-annotation ReadPosRankSum --use-annotation MQRankSum
--mode SNP
--truth-sensitivity-tranche 100.0 --truth-sensitivity-tranche 99.9 --truth-sensitivity-tranche 99.0 --truth-sensitivity-tranche 95.0 --truth-sensitivity-tranche 90.0
--rscript-file ~/MT-1/outname.HC.snps.plots.R
--tranches-file ~/MT-1/outname.HC.snps.tranches
--output ~/MT-1/outname.HC.snps.recal
I met this questiion: A USER ERROR has occurred: Couldn't read file file:///home/chenjie1/~/reference/hg19/hapmap_3.3.hg19.sites.vcf. Error was: It doesn't exist
The command syntax follows the same pattern as version 4.0.9.0.
Has the syntax been changed for GATK version 4.0.11.0?
Thanks.
Best regards.
We have developed a SNV calling method for long reads (https://github.com/pjedge/longshot). The false positive variants that result from our method tend to occur in certain sequence contexts and often have various signals that could be used in conjunction to filter them (including some based on assembled haplotype consistency, etc). It would be nice to be able to combine these signals (reference sequence context as well as annotations in our VCF) to filter variants using a supervised learning approach. I am interested in using CNNVariantWriteTensors, CNNVariantTrain, and CNNScoreVariants for this task, but I'm not sure that it's even possible. Are there design considerations that fundamentally make these tools incompatible with non-illumina sequencing technologies? Further, our output VCF lacks most of the annotations specified in GATK best practices and a lot of those best practice annotations are geared toward Illumina reads. I think a lot of those annotations would not be good features for PacBio reads, if I were to just plug my data into VariantAnnotator to fill in annotations. We would be especially interested in leveraging custom annotations that are long-read specific. Would it be possible for us to define our own annotation set to use with these tools?