Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

Java FileNotFoundError while executing gatk4

$
0
0

I have install gatk on ubuntu 16.04 LTS, 64 bit following the steps mentioned at this link and I have checked that the instlallation is good (everything checked till step#4 on the above provided lnk)

Now , when I run gatk --list, I encounter below error message:

(gatk) bioinfo@bioinfo-pc:~/gatk-4.1.0.0$ gatk --list
Using GATK jar /home/bioinfo/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/bioinfo/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar --help
Traceback (most recent call last):
  File "/home/bioinfo/gatk-4.1.0.0/gatk", line 477, in <module>
    main(sys.argv[1:])
  File "/home/bioinfo/gatk-4.1.0.0/gatk", line 152, in main
    runGATK(sparkRunner, sparkSubmitCommand, dryRun, gatkArgs, sparkArgs, javaOptions)
  File "/home/bioinfo/gatk-4.1.0.0/gatk", line 328, in runGATK
    runCommand(cmd, dryrun)
  File "/home/bioinfo/gatk-4.1.0.0/gatk", line 382, in runCommand
    check_call(cmd)
  File "/home/bioinfo/miniconda2/envs/gatk/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/home/bioinfo/miniconda2/envs/gatk/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/home/bioinfo/miniconda2/envs/gatk/lib/python3.6/subprocess.py", line 707, in __init__
    restore_signals, start_new_session)
  File "/home/bioinfo/miniconda2/envs/gatk/lib/python3.6/subprocess.py", line 1333, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'java'
(gatk) bioinfo@bioinfo-pc:~/gatk-4.1.0.0$ java -version

How to combine 2000 samples in multi-sample SNP calling

$
0
0

Dear GATK Team,

I read the paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4966644/) and do not well understand the "Variant calling was performed on all 2,073 BAM files using the GATK UnifedGenotyper". In this paper, 2073 mouse with ~0.6X were sequenced and UnifedGenotyper was employed to call variants. The method described as: "Variant calling was performed on all 2,073 BAM files using the GATK UnifedGenotyper with thresholds -stand_call_conf 30 and -stand_emit_conf 30, as well as the following options for building variant quality recalibration tables: -A QualByDepth -A HaplotypeScore -A BaseQualityRankSumTest -A ReadPosRankSumTest -A MappingQualityRankSumTest -A RMSMappingQuality -A DepthOfCoverage -A FisherStrand -A HardyWeinberg -A HomopolymerRun. Raw VCF files from the variant calling step for all chromosomes except the Y chromosome were pooled together for VQSR using the GATK VariantRecalibrator under SNP mode. Training, known and true sets for building the positive model are the SNPs that segregate among the classical laboratory strains of the MGP (2011 release REL-1211) on all chromosomes except the Y chromosome."

In the UnifedGenotyper step, I saw the sampling individuals up to 250 in multi-sample SNP calling setting. How to combine 2000 samples in this step? Or I misunderstand something?

Thank you!

Yuzhe

美国本地做文凭学历.国外大学学位证.美国假毕业证样本图片(微信464571773)

$
0
0
(微信464571773),QQ95634381专业办理英国大学毕业证,办理英国大学文凭,英国成绩单大毕业证,代办英国大学毕业证证。仿真毕美国业证文凭,英国成绩单样本制作,本科毕业证.制造成绩单.美国学历认证书.录取通知书,英国国外学历学位认证,使馆认证,留学人员证明,如果要办理请联系我们.诚信制作.质量三包.请放心办理
制作程度
您得先对您自己所办理大学毕业证书要求要有一个简单的描述。以一毕业证为例吧,首先您要做哪个国家的大学文凭,,可很多学校为了显示自己的特点,同时也为了防止假冒,印章盖得也就不规则了,证件里面的内容是电脑打字的,钢印压得清晰度等等。描述完后以电子邮件传递给我们,我方收到后仔细核实并报价给您,这个价位是根据您证件上面的具体印章而报的,,,您如果同意我公司报价,即把资料发给我公司并且支付证件的百分之三十为定金,到此为止我们的第一步就算成交

Issues on FilterMutectCalls: Log10-probability must be 0 or less

$
0
0

When I use FilterMutectCalls to Filter VCF resutls called by Mutect2, I got "Log10-probability must be 0 or less" Error on some of the VCF results (not all of them got the error message, maybe 10 out of 50 or something...), and it seems that the filtering process is done since I got the "org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls done. Elapsed time: 0.54 minutes" message before the error alert, but I got no filtered vcf in my output directory.

Additional Info: the samples' vcf that got issues on FilterMutectCalls use the exact same experimental and bioinformatics approach with those samples of the same batch on which FilterMutectCalls works perfectly normal.

The detailed log is like this (PS: My GATK version is 4.1.0.0):

19:15:30.504 INFO FilterMutectCalls - Done initializing engine
19:15:30.573 INFO ProgressMeter - Starting traversal
19:15:30.574 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
19:15:30.574 INFO FilterMutectCalls - Starting first pass through the variants
19:15:30.606 INFO FilterMutectCalls - Shutting down engine
[March 11, 2019 7:15:30 PM CST] org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls done. Elapsed time: 0.54 minutes.
Runtime.totalMemory()=2032140288
java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.utils.MathUtils.log10BinomialProbability(MathUtils.java:1031)
at org.broadinstitute.hellbender.utils.MathUtils.binomialProbability(MathUtils.java:1024)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.applyContaminationFilter(Mutect2FilteringEngine.java:68)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.calculateFilters(Mutect2FilteringEngine.java:518)
at org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls.firstPassApply(FilterMutectCalls.java:130)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.lambda$traverseVariants$0(TwoPassVariantWalker.java:76)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.traverseVariants(TwoPassVariantWalker.java:74)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.traverse(TwoPassVariantWalker.java:27)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)

Could anyone please help to verify this?

MarkDuplicateSpark is slower than normal MarkDuplicates

$
0
0

Hi,
I was happy to hear that MarkDuplicatesSpark is out of Beta now in Version 4.1. So I tested it on one of our WES samples. Unfortunatelly it took 2 times as long as the normal MarkDuplicate. Here are the two comands I used:

gatk MarkDuplicatesSpark -I '/media/Berechnungen/0028-19.recal.bam' -O '/media/Berechnungen/0028-19.dedup.bam' --metrics-file '/media/Berechnungen/0028-19.metrics' --spark-master local[4] --verbosity ERROR --tmp-dir /media/Ergebnisse/picardtmp/

gatk MarkDuplicates -I '/media/Berechnungen/0028-19.recal.bam' -O '/media/Berechnungen/0028-19.dedup.bam' -M '/media/Berechnungen/0028-19.metrics' --TMP_DIR /media/Ergebnisse/picardtmp/

Did I do something wrong in parallelization?

Bests Stefan

Having Cromwell service a Unix socket

$
0
0

I am trying to have Cromwell webservice on a Unix socket (file) rather than the loopback interface (e.g. default port 8000).

However, I am unsure how to setup the config of cromwell. I have naively tried setting the websevice.port to the socket path with obvious results:

Caused by: com.typesafe.config.ConfigException$WrongType: system properties: port has type STRING rather than NUMBER
    at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:163)
    at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:174)
[...]

Any suggestions or help would be very welcome.

Context: we are running instances of the cromwell server on a shared compute cluster. Having the server listen to a port on the host loopback interface (i.e. localhost:8000) is a security risk whereas using the Unix socket would allow us to set permissions.

Kind regards,
Chris

P.S. I am running Cromwell v37 on Mac OS Sierra (10.12.6).

How to merge the sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls?

$
0
0

How to merge the sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls to a multi-sample VCF file?

The files all have the same bins/records, so it should be easy to created a multi-sample VCF of these files.

I normally use bcftools to merge vcf files. bcftools merge gives the following error when trying to merge the (bgzipped, tabix indexed) sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls

Incorrect number of FORMAT/CNLP values at Chr_01:1001, cannot merge. The tag is defined as Number=A, but found
6 values and 3 alleles. See also http://samtools.github.io/bcftools/howtos/FAQ.html#incorrect-nfields

Can you check if the FORMAT declaration of CNLP is correct.

And advise on if there is a tool in GATK to merge single sample vcf files (created by PostprocessGermlineCNVCalls) to a multi-sample VCF file.

For time being I wrote my own python text parsing script to create the multi-sample VCF file.
But this seems like something that should be possible with GATK or bcftools.

Thank you.

Germline copy number variant discovery (CNVs)

$
0
0

Purpose

Identify germline copy number variants.


Diagram is not available


Reference implementation is not available


This workflow is in development; detailed documentation will be made available when the workflow is considered fully released.


Examining coverage differences between groups of samples

$
0
0
(If this is not the right board to ask this question on, I apologize!)
I'm wondering if there is a GATK tool to help me systematically look at differences in coverage. More detail: I have exome capture data from 10 individuals from one population and 10 individuals from another. I want to see whether there are any regions that have consistently better coverage in one population than the other, for example to see whether the exome capture probes work better for one population. Are there existing tools that could help me summarize this?

BaseRecalibrator - java IllegalArgumentException fromIndex toIndex

$
0
0

Hi everyone,

I'm facing a similar issue with GATK v4.1.0.0 (HTSJDK v2.18.2 and Picard v2.18.25). I'm using GATK Docker image broadinstitute/gatk:4.1.0.0.

Following what I read here, I checked the bam file and everything seems fine:
gatk ValidateSamFile --INPUT sorted.bam --MODE SUMMARY

Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.0.0-local.jar ValidateSamFile --INPUT CQ-NEQAS-2018.ILLUMINA.library.000000000-BCFDC.1.1.sorted.bam --MODE SUMMARY
16:08:17.382 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Mar 07 16:08:17 UTC 2019] ValidateSamFile  --INPUT CQ-NEQAS-2018.ILLUMINA.library.000000000-BCFDC.1.1.sorted.bam --MODE SUMMARY  --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BISULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Thu Mar 07 16:08:24 UTC 2019] Executing as mpmachado@lx-bioinfo02 on Linux 2.6.32-696.23.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.0.0
WARNING 2019-03-07 16:08:24     ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
INFO    2019-03-07 16:10:25     SamFileValidator        Validated Read    10,000,000 records.  Elapsed time: 00:02:00s.  Time for last 10,000,000:  120s.  Last read position: chr9:32,633,613
INFO    2019-03-07 16:12:22     SamFileValidator        Validated Read    20,000,000 records.  Elapsed time: 00:03:58s.  Time for last 10,000,000:  117s.  Last read position: chrM:11,340
No errors found
[Thu Mar 07 16:13:05 UTC 2019] picard.sam.ValidateSamFile done. Elapsed time: 4.79 minutes.
Runtime.totalMemory()=2602041344
Tool returned:
0

But when run BaseRecalibrator got the fromIndex toIndex error:
gatk BaseRecalibrator --input sorted.bam --output sorted.baserecalibrator_report.txt --reference GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.fasta --use-original-qualities true --known-sites snp151common_tablebrowser.bed.bgz --known-sites snp151flagged_tablebrowser.bed.bgz

ERROR: return code 3
STDERR:
15:46:35.795 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:46:42.808 INFO  BaseRecalibrator - ------------------------------------------------------------
15:46:42.810 INFO  BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.0.0
15:46:42.810 INFO  BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
15:46:42.813 INFO  BaseRecalibrator - Executing as mpmachado@lx-bioinfo02 on Linux v2.6.32-696.23.1.el6.x86_64 amd64
15:46:42.814 INFO  BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
15:46:42.814 INFO  BaseRecalibrator - Start Date/Time: March 7, 2019 3:46:35 PM UTC
15:46:42.815 INFO  BaseRecalibrator - ------------------------------------------------------------
15:46:42.815 INFO  BaseRecalibrator - ------------------------------------------------------------
15:46:42.817 INFO  BaseRecalibrator - HTSJDK Version: 2.18.2
15:46:42.817 INFO  BaseRecalibrator - Picard Version: 2.18.25
15:46:42.817 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:46:42.818 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:46:42.818 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:46:42.818 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:46:42.819 INFO  BaseRecalibrator - Deflater: IntelDeflater
15:46:42.819 INFO  BaseRecalibrator - Inflater: IntelInflater
15:46:42.819 INFO  BaseRecalibrator - GCS max retries/reopens: 20
15:46:42.819 INFO  BaseRecalibrator - Requester pays: disabled
15:46:42.820 INFO  BaseRecalibrator - Initializing engine
15:46:43.760 INFO  FeatureManager - Using codec BEDCodec to read file file:///snp151common_tablebrowser.bed.bgz
15:46:44.016 INFO  FeatureManager - Using codec BEDCodec to read file file:///snp151flagged_tablebrowser.bed.bgz
15:46:44.076 WARN  IndexUtils - Feature file "snp151common_tablebrowser.bed.bgz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
15:46:44.500 WARN  IndexUtils - Feature file "snp151flagged_tablebrowser.bed.bgz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
15:46:44.798 INFO  BaseRecalibrator - Done initializing engine
15:46:44.936 INFO  BaseRecalibrationEngine - The covariates being used here:
15:46:44.936 INFO  BaseRecalibrationEngine -    ReadGroupCovariate
15:46:44.937 INFO  BaseRecalibrationEngine -    QualityScoreCovariate
15:46:44.937 INFO  BaseRecalibrationEngine -    ContextCovariate
15:46:44.937 INFO  BaseRecalibrationEngine -    CycleCovariate
15:46:44.953 INFO  ProgressMeter - Starting traversal
15:46:44.953 INFO  ProgressMeter -        Current Locus  Elapsed Minutes       Reads Processed     Reads/Minute
15:46:45.866 INFO  BaseRecalibrator - Shutting down engine
[March 7, 2019 3:46:45 PM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=731381760
java.lang.IllegalArgumentException: fromIndex(64) > toIndex(62)
    at java.util.Arrays.rangeCheck(Arrays.java:113)
    at java.util.Arrays.fill(Arrays.java:3044)
    at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.calculateKnownSites(BaseRecalibrationEngine.java:354)
    at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.calculateSkipArray(BaseRecalibrationEngine.java:322)
    at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:137)
    at org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator.apply(BaseRecalibrator.java:185)
    at org.broadinstitute.hellbender.engine.ReadWalker.lambda$traverse$0(ReadWalker.java:91)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:89)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.0.0-local.jar BaseRecalibrator --input sorted.bam --output sorted.baserecalibrator_report.txt --reference GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.fasta --use-original-qualities true --known-sites snp151common_tablebrowser.bed.bgz --known-sites snp151flagged_tablebrowser.bed.bgz

I downsampled the fastq files and got similar results.
However, when giving only the reduced known-sites file (--known-sites snp151flagged_tablebrowser.bed.bgz) and specifying two intervals (--intervals chr22 --intervals chrY), it worked.

I attached the downsampled bam file and the reduced known-sites file, and the reference file can be found here.

I hope you can help me understanding what is going on and how to fix it.

Thank you in advance.

Best regards,

Miguel

SplitNCigarReads exception

$
0
0

my script :

java -jar ~/bin/gatk-3.2-2/GenomeAnalysisTK.jar -T SplitNCigarReads -R Gmax.fa -I NPB18L_mark.bam -o NPB18L_snc.bam -U ALLOW_N_CIGAR_READS -fixNDN

when i use -fixNDN, it will be:

java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at org.broadinstitute.gatk.tools.walkers.rnaseq.SplitNCigarReads.initialize(SplitNCigarReads.java:150)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)

but i don't use -fixNDN, it will be:

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Bad input: Cannot split this read (might be an empty section between Ns, for example 1N1D1N): 94M407N1D1033N6M

how can i fix it???

single-sample GVCF calling on DNAseq with allele-specific annotations

$
0
0
thanks a lot . I read the argument explanation, but due to my low ability, I still can not understand
-G Standard -G AS_Standard

why they need and when should I add they, thanks a lot

a chemotherapy site not appear in vcf and bam-out bam but apper in sorted.dedup.bam?

$
0
0
thanks a lot. I have a important question want to confirm with you.
a very important chemotherapy site not appear in vcf and bam-out bam but apper in sorted.dedup.bam as the figure shows.

the gayk.bam is the the argument --bam-out in haplotcaller, and the sorted.dedup is the bam in the forward steps that you kown as usual.

you can see here are 532 reads here, 228 reads support indel , 17 support del in the orted.dedup.bam.

I kown the --bam-out bam is a reassemble bam and stores the reliable variant gatk model trusts from the statistic.

you can see the region is a poly region, many TA, is this also a bad impact for the gatk model to make decision.

thanks a lot, I want to know how should I give out that site, because 6TA/7TA, 7TA/7TA, 6TA/6TA stands for different chemotherapy toxicity.

which genotype should I give of 6TA/7TA, 7TA/7TA, 6TA/6TA

thanks a lot

Hom-ref calls in haplotypecaller version 4.0.12.0 on non-genotyping single sample calling

$
0
0

Hi.
We're running GATK HaplotypeCaller on a single sample, on discovery mode (non-genotyping), and for some reason we're getting some variants that have a hom-ref genotype (0/0). On a whole exome calling, we're getting about ~250 such variants, and they are most often, if not all, are small deletions.

Here's an example of such vcf entry:
chr7 150783922 . T . 2376.73 . AN=2;DP=49;MMQ=0;MQ=59.74 GT:AD:DP 0/0:49:49

The command we run is very basic. The only parameters are the reference, ploidy=2 , input bam, callable regions and interval-set-rule.

Why do we see these variants ? Is this a bug ?

Thanks.

Problem with BaseRecalibrator in v2.2-8-gec077cd

$
0
0

Hello dear GATK People,

I'm failing with BaseRecalibrator from the new GATK version - my pipeline worked with the 2.1-11, below is my error message.
Any quick fix or should I stick to the old version?

Ania

ERROR stack trace

java.lang.IllegalArgumentException: fromIndex(402) > toIndex(101)
at java.util.Arrays.rangeCheck(Unknown Source)
at java.util.Arrays.fill(Unknown Source)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateKnownSites(BaseRecalibrator.java:280)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateSkipArray(BaseRecalibrator.java:259)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:239)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:287)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:252)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:91)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:55)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.2-8-gec077cd):

.....

ERROR MESSAGE: fromIndex(402) > toIndex(101)
ERROR ------------------------------------------------------------------------------------------

GenomicsDBImport does not support GVCFs with MNPs; GATK (v4.1.0.0)

$
0
0

Hello!

I am running the GATK (v4.1.0.0) best practices pipeline on FireCloud with 12 pooled WGS samples; one pooled sample contains ~48 individual fish (I am using a ploidy of 20 throughout the pipeline). Though I have 24 linkage groups I also have 8286 very small scaffolds that my reads are aligned to, which has caused some issues with using scatter/gather and running the tasks by interval with -L (though that is not my main issue here). Lately I have run into a problem at the JointGenotyping stage.

I have one GVCF for each pool from HaplotypeCaller, and I tried to combine them all using CombineGVCFs. Because of the ploidy of 20 I thought I could not use GenomicsDBImport. I had the same error using CombineGVCFs as the person in this thread: gatkforums.broadinstitute.org/gatk/discussion/13430/gatk-v4-0-10-1-combinegvcfs-failing-with-java-lang-outofmemoryerror-not-using-memory-provided. No matter the amount of memory I allowed the task, it failed every time.

But following @shlee's advice and reading this: github.com/broadinstitute/gatk/issues/5383 I decided to give GenomicsDBImport a try. I just used my 24 linkage groups, so my interval list has only those 24 listed.

I am stumped by the error I got for many of the linkage groups:

***********************************************************************

A USER ERROR has occurred: Bad input: GenomicsDBImport does not support GVCFs with MNPs. MNP found at LG07:4616323 in VCF /6942d818-1ae4-4c81-a4be-0f27ec47ec16/HaplotypeCallerGVCF_halfScatter_GATK4/3a4a3acc-2f06-44dc-ab6d-2617b06f3f46/call-MergeGVCFs/301508.merged.matefixed.sorted.markeddups.recal.g.vcf.gz

***********************************************************************

What is the best way to address this? I didn't see anything in the GenomicsDB documentation about flagging the MNPs or ignoring them. I was thinking of removing the MNPs using SelectVariants, before importing the GVCFs into GenomicsDB but how do you get SelectVariants to output a GVCF, which is needed for Joint Genotyping.

What would you recommend I do to get past this MNP hurdle?

The homo-reference site is called as "./." in GVCF mode and has "LowQual" tag in native mode

$
0
0
Dear GATK team:

I'm currently working on calling the variants for our amplicon sequencing data targeting regions around 300 bps. The data is generated by Illumina Miseq with 250 paired end read. So forward and reverse read in one paired end read will overlap.

The original BAM file is processed first by setting the initial 23 bases of the read to N for each paired read group to remove the primer influence on variant calling. Then I used GATK haplotypecaller in GATK-3.6-0-g89b7209 in GVCF mode followed by GVCFgenotype for joint genotyping.
In the result, I observed a lot of "./." called in the joint genotyping result with high coverage of AD and DP. It seems that they only exist for homo-reference site. They do not show in every sample but only specific ones.

One example from the vcf result is shown below:

chr1 18722933 rs56176731 G C 3760.6 . AC=2;AF=0.019;AN=104;BaseQRankSum=7.94;ClippingRankSum=0.00;DB;DP=311240;ExcessHet=4.4395;FS=0.000;InbreedingCoeff=-0.0558;MLEAC=2;MLEAF=0.019;MQ=13.65;MQRankSum=0.00;QD=1.96;ReadPosRankSum=1.72;SOR=0.150 GT:AD:DP:GQ:PL:SAC 0/0:10893,0:10893:99:0,120,1800 0/0:3095,0:3095:99:0,120,1800 0/0:7109,0:7109:99:0,120,1800 0/0:6159,0:6159:99:0,120,1800 ./.:9187,0:9187:.:0,0,0 0/0:9921,0:9921:99:0,120,1800 0/0:7886,0:7886:99:0,120,1800 0/0:9599,0:9599:99:0,120,1800 0/0:3568,0:3568:99:0,120,1800 0/0:3587,0:3587:99:0,120,1800 ./.:10150,0:10150:.:0,0,0 0/0:10063,0:10063:99:0,120,1800 0/0:7977,0:7977:99:0,120,1800 0/0:6701,0:6701:99:0,120,1800 0/0:9992,0:9992:99:0,120,1800 0/0:8268,0:8268:99:0,120,1800 0/0:7164,0:7164:99:0,120,1800 0/0:3071,0:3071:99:0,120,1800 0/0:3744,0:3744:99:0,120,1800 0/0:4276,0:4276:99:0,120,1800 0/0:2209,0:2209:0:0,0,2796 0/0:2951,0:2951:0:0,0,3425 0/0:9073,0:9073:99:0,120,1800 0/0:2828,0:2828:99:0,120,1800 ./.:3450,0:3450:.:0,0,0 0/0:2960,0:2960:99:0,120,1800 0/0:4119,0:4119:99:0,120,1800 0/0:5063,0:5063:99:0,120,1800 0/1:1305,331:1645:99:858,0,62435:0,1305,0,331 0/0:4505,0:4505:99:0,120,1800 0/0:2868,0:2868:99:0,120,1800 0/0:6611,0:6611:0:0,0,876 0/0:7709,0:7709:0:0,0,3260 0/0:4767,0:4767:99:0,120,1800 0/0:4956,0:4956:99:0,120,1800 0/0:6305,0:6305:99:0,120,1800 0/0:1866,0:1866:99:0,120,1800 0/1:73,214:287:99:2943,0,1029:0,73,0,214 0/0:8021,0:8021:0:0,0,9405 0/0:2878,0:2878:99:0,120,1800 0/0:8165,0:8165:99:0,120,1800 0/0:3005,0:3005:99:0,120,1800 ./.:3688,0:3688:.:0,0,0 0/0:7725,0:7725:0:0,0,11872 0/0:8611,0:8611:99:0,120,1800 0/0:3994,0:3994:99:0,120,1800 0/0:4031,0:4031:0:0,0,3673 0/0:5476,0:5476:99:0,120,1800 0/0:8891,0:8891:99:0,120,1800 0/0:2868,0:2868:99:0,120,1800 ./.:3637,0:3637:.:0,0,0 ./.:7150,0:7150:.:0,0,0 0/0:1092,0:1092:99:0,120,1800 0/0:924,0:924:99:0,120,1800 ./.:4579,0:4579:.:0,0,0 0/0:1658,0:1658:99:0,120,1800 0/0:2684,0:2684:99:0,120,1800 0/0:1396,0:1396:99:0,120,1800 0/0:1328,0:1328:99:0,120,1800 ./.:2372,0:2372:.:0,0,0

The command line parameter I used for haplotypecaller (one sample):

java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -dt NONE --genotyping_mode DISCOVERY -A StrandAlleleCountsBySample -A StrandBiasBySample -R ucsc.hg19_nohap_v3_fixed.fasta -I JK0812_m_clipped.bam -o JK0812_m_clipped.g.vcf --dbsnp dbsnp_138.hg19.vcf -L target_region.bed -ERC GVCF --variant_index_type LINEAR --variant_index_parameter 128000 --maxReadsInRegionPerSample 200000

In the command line, I turned the downsampling off and set the maximum read in one active region per sample to 200000 to account for the nature of targeted sequencing data.

I then picked a single sample with "./." and run it on HaplotypeCaller native mode.

The genomic interval range is set to chr1:18722675-18722950, the command line is used as follows:

java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -dt NONE --genotyping_mode DISCOVERY -I SK1466_m_clipped.bam -R ucsc.hg19_nohap_v3_fixed.fasta --dbsnp dbsnp_138.hg19.vcf -L chr1_target_region.bed -o SK1466_m_clipped.vcf -allSitePLs --maxReadsInRegionPerSample 200000 -out_mode EMIT_ALL_SITES

And some of the results shown below:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SK1466
chr1 18722701 . C . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722702 . T . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722703 . A . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722704 . G . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722706 . T . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722707 . A . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722709 . T . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722710 . T . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722711 . C . 0 LowQual AN=2;DP=6432;MQ=60.00 GT:AD:DP 0/0:6432:6432
chr1 18722713 rs1336130 T C 107857.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.647;ClippingRankSum=-0.000;DB;DP=6432;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=-0.000;QD=16.78;ReadPosRankSum=-0.109;SOR=0.532 GT:AD:DP:GQ:PL 0/1:3497,2930:6427:99:107886,0,133316

The homo-ref site is labeled as LowQual with QUAL=0 while no PL is calculated. Any one could let me know what is happening with the weird calling in GVCF mode and native mode of haplotypecaller?

I have generated the corresponding bamout file and found that many haplotypes are generated because the nature of amplicon sequencing. Please let me know how to upload them or insert the comparison figure in IGV.

One similar thread: gatkforums.broadinstitute.org/gatk/discussion/8783/homozygous-reference-genotype-is-called-in-native-mode-but-uncalled-in-haplotypecaller-erc-modes

Help on this will be greatly appreciated.

Best

Out of order read after MarkDuplicateSpark + BaseRecalibrator/ApplyBQSR

$
0
0

Hi,

I am building a workflow for discovery of somatic snvs + indels that is pretty much the Broad's Best Practice but incorporating MarkDuplicatesSpark and a couple of other minor changes. Today I was running a normal-tumor pair of samples from WES experiments in GCP, and everything was going great until the workflow failed during Mutect2. In one of the shards (I am scattering the M2 step through 12 splits of the exome bedfile) I got this error:

    13:53:46.994 INFO  ProgressMeter -       chr19:18926479             20.2                 22440           1112.1
    13:53:51.138 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.589863008
    13:53:51.145 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 415.78724766500005
    13:53:51.147 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 82.56 sec
    13:53:52.161 INFO  Mutect2 - Shutting down engine
    [February 19, 2019 1:53:52 PM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 20.68 minutes.
    Runtime.totalMemory()=1132453888
    java.lang.IllegalArgumentException: Attempting to add a read to ActiveRegion out of order w.r.t. other reads: lastRead SRR3270880.37535587 chr19:19227104-19227253 at 19227104 attempting to add SRR3270880.23592400 chr19:19226999-19227148 at 19226999
        at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:730)
        at org.broadinstitute.hellbender.engine.AssemblyRegion.add(AssemblyRegion.java:338)
        at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.fillNextAssemblyRegionWithReads(AssemblyRegionIterator.java:230)
        at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.loadNextAssemblyRegion(AssemblyRegionIterator.java:194)
        at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.next(AssemblyRegionIterator.java:135)
        at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.next(AssemblyRegionIterator.java:34)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:286)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:267)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar

The other 11 shards finished without errors and produced the expected output.

I checked the bam from the tumor sample and indeed the read mentioned in the error is out of order. It is the second read from the end in the following snippet (pasting here only the first 9 columns from the bam file):

    SRR3270880.37535587 163 chr19   19227104    60  150M    =   19227395    441
    SRR3270880.46694860 147 chr19   19227106    60  150M    =   19226772    -484
    SRR3270880.60287639 1171    chr19   19227106    60  150M    =   19226772    -484
    SRR3270880.68448188 83  chr19   19227106    60  150M    =   19226611    -645
    SRR3270880.70212050 1171    chr19   19227106    60  150M    =   19226772    -484
    SRR3270880.23592400 163 chr19   19226999    60  150M    =   19227232    383
    SRR3270880.21876644 1171    chr19   19227001    60  150M    =   19226793    -358

The read does not have any bad quality flags and it appears twice in the bam, being in the correct order in its first occurrence (second read in the following snippet):

    SRR3270880.61849825 147 chr19   19226995    60  150M    =   19226895    -250
    SRR3270880.23592400 163 chr19   19226999    60  150M    =   19227232    383
    SRR3270880.21876644 1171    chr19   19227001    60  150M    =   19226793    -358
    SRR3270880.47062210 147 chr19   19227001    60  150M    =   19226625    -526

The workflow does not include SortSam after MarkDuplicatesSpark as MDSpark's output is supposed to be coordinate sorted. From the bam's header: @HD VN:1.6 GO:none SO:coordinate

Previous to Mutect2, BaseRecalibrator-GatherBqsrReport-ApplyBQSR-GatherBamFiles (non-Spark versions) finished without any errors. These steps are also scattered through interval splits of the exome bedfile.

Strikingly, the start-end positions of this out of order read span from the last interval of interval split 6 to the first interval of interval split 7. Maybe the read was included in two contiguous splits of the bam file at the same time and that is why it appears twice in the bam file after the merge done by GatherBamFiles. (Last interval from split 6: chr19 19226311 19227116 ; first interval from interval split 7 : chr19 19227145 19228774 )

Intervals in my workflow are split by "SplitIntervals" tool (gatk4.1.0.0). I am currently including the argument --subdivision-mode BALANCING_WITHOUT_INTERVAL_SUBDIVISION and feel that this could have to do with the error...

Any ideas of how this issue can be solved?

Thank you in advance

GATK4-mutect2 how to or should I use a newer gnomad r2.1 as germline-resource

$
0
0
Dear GATK Team,

We have a program to detect somatic mutations of tumor-vs-normal samples. Although we had read the
mutect2 guide——best practice for mutect2(gatk post#11136), but there is no idea for me to go on.

Gnomad had release a newer version r2.1, but the gatk bundle holds an old version——especially the b37 shows the year 2017.

Now we don't know if should use the r2.1 as a germline-resource, because there're more allele frequencies in the newer version.

If we want to use the newer gnomad as a resource, what should we do to make a 'af-only-gnomad_hg19.vcf' (you see, we used the hg19 but not grch38). Apple is too big to bite, the resource gnomad is the same. By the way, we wish detect the whole genome mutations.
Wish your advice.
Thank you.

what is --intervals argument while using GenomicsDBImport in germling short variant discovery pipeli

$
0
0
Can u provide the script for running germline short variant discovery??
Viewing all 12345 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>