Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

GATK version

$
0
0
What is the main difference between the latest GATK and GATK version 3.5 ? :)

When I use CromWell to run a WDL, I got error like Docker lookup failed

$
0
0

Here is my command:

java -Dconfig.file=myCromwell.conf -jar cromwell-34.jar run processing-for-variant-discovery-gatk4.wdl --inputs input.json

I got following errors:

[warn] BackendPreparationActor_for_70084e86:PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem:3:1 [70084e86]: Docker lookup failed
java.lang.Exception: Failed to get docker hash for broadinstitute/genomes-in-the-cloud:2.3.1-1512499786 Connection failed.
at cromwell.engine.workflow.WorkflowDockerLookupActor.cromwell$engine$workflow$WorkflowDockerLookupActor$$handleLookupFailure(WorkflowDockerLookupActor.scala:196)
at cromwell.engine.workflow.WorkflowDockerLookupActor$$anonfun$3.applyOrElse(WorkflowDockerLookupActor.scala:94)
at cromwell.engine.workflow.WorkflowDockerLookupActor$$anonfun$3.applyOrElse(WorkflowDockerLookupActor.scala:78)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at akka.actor.FSM.processEvent(FSM.scala:670)
at akka.actor.FSM.processEvent$(FSM.scala:667)
at cromwell.engine.workflow.WorkflowDockerLookupActor.akka$actor$LoggingFSM$$super$processEvent(WorkflowDockerLookupActor.scala:39)
at akka.actor.LoggingFSM.processEvent(FSM.scala:806)
at akka.actor.LoggingFSM.processEvent$(FSM.scala:788)
at cromwell.engine.workflow.WorkflowDockerLookupActor.processEvent(WorkflowDockerLookupActor.scala:39)
at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:664)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:658)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at cromwell.docker.DockerClientHelper$$anonfun$dockerResponseReceive$1.applyOrElse(DockerClientHelper.scala:16)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at cromwell.engine.workflow.WorkflowDockerLookupActor.aroundReceive(WorkflowDockerLookupActor.scala:39)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Has anyone reported the new release 4.0.12.0 CalculateContamination's BUG?

$
0
0

Result of gatk4.0.12.0

Result of gatk4.0.11.0

And the bug may result in an error in the filter step.

CombineVariants in GATK4

$
0
0

Is it planned to add CombineVariants tool into GATK4.0 toolkit (it existed in previous GATK versions)? The only similar tool currently available in GATK4.0 Beta is GatherVCFs which has very limited possibility and cannot concatenate unsorted VCFs or merge different INFO fields correctly.
Thanks! :)

HaplotypeCaller Incompatible Contigs DNASeq

$
0
0
I'm using GATK 4.0.11 and I'm getting the following error message when I run HaplotypeCaller on DNAseq data:

10:19:17.089 INFO HaplotypeCaller - ------------------------------------------------------------

10:19:17.089 INFO HaplotypeCaller - ------------------------------------------------------------

10:19:17.090 INFO HaplotypeCaller - HTSJDK Version: 2.16.1

10:19:17.090 INFO HaplotypeCaller - Picard Version: 2.18.13

10:19:17.091 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2

10:19:17.091 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

10:19:17.091 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

10:19:17.091 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

10:19:17.091 INFO HaplotypeCaller - Deflater: IntelDeflater

10:19:17.091 INFO HaplotypeCaller - Inflater: IntelInflater

10:19:17.091 INFO HaplotypeCaller - GCS max retries/reopens: 20

10:19:17.092 INFO HaplotypeCaller - Requester pays: disabled

10:19:17.092 INFO HaplotypeCaller - Initializing engine

10:19:17.536 INFO HaplotypeCaller - Shutting down engine

[January 5, 2019 10:19:17 AM EST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.12 minutes.


Runtime.totalMemory()=311427072



A USER ERROR has occurred: Input files reference and reads have incompatible contigs: No overlapping contigs found.

reference contigs = [chr17:c43125483-43044295]

reads contigs = []



I then tried another file from NCBI:



A USER ERROR has occurred: Input files reference and reads have incompatible contigs: No overlapping contigs found.

reference contigs = [chr17:c43125483-43044295]


reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]

The proceeding steps were FastqToSam, BWA, and MarkDuplicates.

Any suggestions?

GATK docs

$
0
0

Hello,
I would like to request 2 quality of life improvements for this website.

The tool documentation page: https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.0.0/index

Each tool is hidden inside a category. This makes finding the docs for a tool I want tedious. I want to just use my browsers find function (ctrl-f). But I actually have to first open each category or use the built in whole-site search feature (which is far far slower). It also stops me from easily trying to guess a tool name by quickly searching for words like index or gather.

Secondly the version selection dropdown is sorted alphabetically instead of by version.

Gatk Baserecalibrator Error

$
0
0
After mark duplicate step using picard i added read groups to bam file and now i am trying to run baserecalibrator using gatk. i am getting following error

chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:169)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:98)
at org.broadinstitute.hellbender.engine.GATKTool.validateSequenceDictionaries(GATKTool.java:709)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:643)
at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:50)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)


this is my command-
gatk BaseRecalibrator -R hg38.fa -I input.bam -known-sites dbsnp_138.hg19.vcf -O output.bam

Could you please me to resolve this error. Thank you in advance.

Germline CNV, the correct order of using the involved tools.

$
0
0

Dear all,

I know that the tutorial is not available yet for the Germline CNV and it is still in beta, but can anyone just write some numbered points about the order of the involved tools please? just as points. To my understanding it is something like this:
1- CollectReadCounts
2- DetermineGermlineContigPloidy
3- GermlineCNVCaller
4- post processing to get the vcf file

Are those steps enough to be considered? Or are there any extra beneficial steps that might be added in between?

Many thanks for the help
Nawar


GenotypeGVFs, -ploidy

$
0
0

GATK 4.0.11.0, human WES

Hi,

I'm using HaplotypeCaller with the -ploidy option for the sex chromosomes:

Male: chr1-22 = ploidy 2
Male: chrX,Y = ploidy 1

Female: chr1-22,X = ploidy 2

With the MergeVcfs I merge the two male gvcfs to one and then I join my cohort gvcfs (males and females samples) with GenomicsDBImport and at the end I'm going to use GenotypeGVCFs.

I cann't figure out how to calculate the ploidy value to set up with GenotypeGVCFs. What is the meaning of "pool" in my case? My male gvcfs have a mixed ploidy. I have to change strategy? I have 50 male and 55 female samples.

Many thanks

> ----------------------------
> --sample_ploidy / -ploidy
> Ploidy per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).
> Sample ploidy - equivalent to number of chromosome copies per pool. For pooled experiments this should be set to the number of samples in pool multiplied by individual sample ploidy.
> ----------------------------

understanding ValidateVariants output

$
0
0

Hi,

I am trying to understand what does the number of records mean in ValidateVariants output? Notice in the below output for the VCF file it says checked 3814206 records when using GVCF it says 1 record. Using GATK version v3.7-0-gcfedb67 . thanks!

Output when i run ValidateVariants on a VCF (multi-sample)

<br /> Successfully validated the input file. Checked 3814206 records with no failures.<br /> Done. There were 1 WARN messages, the first 1 are repeated below.<br /> WARN 17:01:10,487 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation<br />

Output when running on a GVCF (multi-sample) file.

Successfully validated the input file. Checked 1 records with no failures. There were 2 WARN messages, the first 2 are repeated below. WARN 22:00:24,094 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation WARN 22:00:24,159 ValidateVariants - GVCF format is currently incompatible with allele validation. Not validating Alleles.

How to output AD ref and alt in separate columns with VariantsToTable

$
0
0

I am using VariantsToTable (GATK 3.8) to convert my MuTect2 .vcf to .tsv. The command looks like this:

gatk.sh -T VariantsToTable \
    -R "${ref_fasta}" \
    -V "${output_vcf}" \
    -F CHROM -F POS -F ID -F REF -F ALT -F FILTER -F QUAL -F AC -F AN -F NLOD -F TLOD \
    -GF AD -GF DP -GF AF \
    -o "${output_tsv}"

The relevant output columns in the output look like this:

CHROM   POS ID  REF ALT NORMAL.AD   TUMOR.AD
chr1    3102936 .   C   G   496,2   392,0

Here, column NORMAL.AD has values that look like 496,2, along with TUMOR.AD that has values in the same format, 392,0. Its my understanding that these values are the ref and alt allelic depths, respectively, for each sample.

I would like to instead have the fields output as separate columns. This would make the table output much easier to use downstream, and especially to parse e.g. in Excel. For example if there is a value such as 23,390 in this column, then Excel thinks it is the number 23,390 and it is extremely difficult to get it to perform string operations on it on order to filter in the ref and alt AD values. I end up having to write a custom script just to parse this single column and output a whole new .tsv table with the correctly split values.

It would be much easier if the values could just be output from VariantsToTable as separate fields in the first place. Is this possible?

Looking for picard per_target_coverage output field definitions

$
0
0

Hello, I ran Picard CollectHSmetrics with the optional per_target_coverage output and I am looking for the definitions of the following fields: "%GC", "mean_coverage", "normalized_coverage" (the definition for this field was already posted), "min_normalized coverage", "max_normalized coverage", "min_coverage", "max_coverage", "pct_0x" and "read_count".
Cheers

Alternate Alleles in VCF are more than 1 base

$
0
0

Hi there,

I've removed INDELS from a multi-sample vcf from HaplotypeCaller using SelectVariants. However, the ALT 'SNPs' are more than a single nucleotide substitution. Eg.

TTTTTTGTTTTTTGTTTT,GTTTTTGTTTT,G
TTTTTTTA,*
TTTTTTTAG,*
TTTTTTTATTTTTCATTTA,*
TTTTTGTTTTTTTA,TC,*

Q1) What is the meaning of the * symbol?
Q2) Is it to be expected that these SNPs are more than a single nucleotide substitution?

Thanks,
Tom

GATK 4 version still have some problems with asterisk in the VCF files.

$
0
0

Dear GATK team,

I had used GATK 4 and GATK 3 best proctices to run WES data. BAM->gvcf->joint variant detection, VCF file->VQSR and so on.
I noticed that in the GATK offical website, you indeed had explained what * asterisk standed for. Right now, many people had tried to perform SelectVariants --selectTypeToExclude SYMBOLIC --excludeNonVariants --removeUnusedAlternates to remove * sign. I also had tried both GATK 4 and GATK 3. The above commands did not work at all.
for example, chr1 1296543 rs113960113 AACAC A,* 59000.80 . . GT:AD:DP:GQ:PL 0/0 0/2 0/0 0/1

My question is that GATK will treat this variant as mulitivariants, so it will be deleted when we do downstream analyses since we cannot retrieve or know what exactly genotype these sample have. Normally, if a unknown variant or panning deletion/insertion occures in this loci, we should not trust this variant. However, sometime, we indeed saw some these kind of variants had a pretty high quality. It will be pity to remove them or cannot retrieve them. Do you have any sugguestions about this? Many thanks.
chr1 1296543 rs113960113 AACAC A,* 59000.80 . . GT:AD:DP:GQ:PL 0/0 0/2 0/0 0/1

Wenjuan

Badly formed genome location: Parameters to GenomeLocParser are incorrect (CombineVariants)

$
0
0
Hello,

I am using GATK version 3.8.1 (3.8-1-0-gf15c1c3ef) and wanted to merge 4 vcf files using the CombineVariants command.
The command i am using is here:

java -jar GenomeAnalysisTK.jar \
-T CombineVariants -R genome.fa \
-nt 20 \
--variant a.vcf \
--variant b.vcf \
--variant c.vcf \
--variant d.vcf \
-o Combined.vcf \
-genotypeMergeOptions UNIQUIFY

However after running for a while i get this error:
##### ERROR MESSAGE: Badly formed genome location: Parameters to GenomeLocParser are incorrect:The stop position 136765 is less than start 199815 in contig 19

When i check in the file (d.vcf) there is indeed a variant line:
19 199815 BND00000924 C C[1:136765[ . LowQual PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.7.9;CHR2=1;END=136765;PE=2;MAPQ=24;CT=3to5;CIPOS=-33,33;CIEND=-33,33;INSLEN=0;HOMLEN=34;SR=10;SRQ=0.953271;CONSENSUS=GGGTGTGAGGCAAGGGGCTCACGCTGACCTCTGTCCGCGTGGGAGGGGCCGGTGTGAGGCAAGGGGCTCGGGCTGACCTCTCTCAGCGTGGGAGGGGGCGGGG;CE=1.77046 GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV 0/0:0,-11.8581,-97.1159:119:PASS:43:174:379:1:61:0:41:0 0/0:0,-34.4186,-340.549:10000:PASS:195:1022:1304:1:228:0:139:4


The problem i think is that the merging tool is looking at the INFO/END tag and determining the end location of the translocation. But it ignores the INFO/CHR2 tag, which provides the information that the Translocation spans across different chromosomes.

Is there something wrong in the way my vcf file is organised (it is vcf v.4.2)?

Is there way to remove this error? Thank you very much for your help/support!

Mutect on mm10

$
0
0

Hello,

I am trying to run mutect on mouse, and getting the following error ERROR MESSAGE: Unable to parse header with error: Your input file has a malformed header: VCFv4.2 is not a supported version, for input source: /cromwell_root/fc-f36b3dc8-85f7-4d7f-bc99-a4610229d66a/broadinstitute/reference/mm10/129S1_SvImJ.mgp.v5.snps.dbSNP142.with_chrom.sorted.vcf

The VCF is of version 4.2. Is this really not supported? Is there an easy fix? Thanks

Can the CNV workflow use for WES data?

$
0
0
Hi Team,
May I ask if the CNV workflow can be used on WES data?
Thanks!

all mutations in tumor filtered out as 'artifact_in_normal'

$
0
0

Hello

I am trying to run mutect2-gatk4 (Snapshot 13) with the parameters set as in the featured "help-gatk/Somatic-SNVs-Indels-GATK4" firecloud-workspace (including "run_orientation_bias_filter" set as "False") on a previously published WGS cohort of tumor/normal pairs (Bender, Sebastian, et al. Nature medicine (2016)). To create the PON, I used the "mutect2_pon" method from the same workspace.

In the resulting vcf files all mutations found are filtered out as 'artifact_in_normal'. I also tried to run the same method in tumor-only mode on the normals and tumors separately (with a PON created from WGS normals of a different cohort) and I do get a number of different mutations between the tumor and it's paired normal. Do you know what's going on? Is this a format issue, another filter I should deactivate or something else I'm missing?

Thanks,
F

IntelDeflater problem on Kernel Update

$
0
0

Dear members of the GATK Team,

we ran into a curious problem after updating the Kernel of our machines from 4.4 to 4.12.
Our pool of computing nodes has Xeon E5-2697 v4 and Xeon Gold 6148 CPUs.

In the machines equipped with the latter CPU (Xeon 6148) the GATK HaplotypeCaller and other tools using IntelDeflater stopped working, hanging just before printing the message:

INFO 15:41:08,160 GenomeAnalysisEngine - Deflater: IntelDeflater
INFO 15:41:08,161 GenomeAnalysisEngine - Inflater: IntelInflater

If I set use-jdk-inflater (and deflater) they go through, but terribly slow. On the machines with E5-2697 (older CPU) everything works fine.

This problem occurs using GATK 3.8, while doesn't appear into GATK4. Is there a way to fix that other than updating? We wanted to be consistent for now with analysed data.

Thanks for the support!

Riccardo

Getting the absolute (not %) of 'USABLE_BASES_ON_TARGET'

$
0
0

I am using PICARD HsMetric. I would like to get the total number of bases on target that are usable (not duplicates) for variant calling. I can output this as a percentage (PCT_USABLE_BASES_ON_TARGET), but I am not sure how to convert this into absolute number of bases (because there are so many metrics outputted by this tool).

Viewing all 12345 articles
Browse latest View live


Latest Images