Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

% of mapped reads

$
0
0

What tool in Picard will tell me the number of reads that mapped to my reference sequences? I want to know the percentage of mapped reads, i.e. number of reads mapped/number of total reads


[GATK 4.1.0.0] Funcotator issues.

$
0
0

Hi, I have questions in using Funcotator in GATK 4.1.0.0. My input vcf uses b37 (GRCh37) as a reference.

1) What are chr1_a_bed and chr1_b_bed in the resource bundle, funcotator_dataSources.v1.6.20190124s.tar.gz?
2) Would you guide me how to localize gnomAD resources? It is configured to use google cloud, which is not allowed in my computing environment unfortunately.
3) A lot of fields in Funcotator output are empty. I feel I am missing something. For over 13000 variants, I don't get any variant annotated by Cosmic, SwissProt, GO, TCGAscape, DrugBank, CCLE, CGC, ClinVar, Familial Cancer Genes, HGNC, etc. Would you help me annotate variants as much as possible?

Thank you!

Variants estimated by gatk4 mutect2 are dramatically less compared estimated by vardict or others.

$
0
0

Dear GATK team.

Hello. I'm GATK4 user.

I compared the number of variants between gatk4 mutect2 raw variants and Vardict raw variants. As compared between output of callers, I identified they are so different.

I have two questions.

Through this document ( https://software.broadinstitute.org/gatk/documentation/article?id=11136) and @Sheila 's comment, I understood that gnomAD and PON are not effect on the number of variants and that these two database (gnomAD and PON) are only used for annotation. (Q1. Is it okay?)

Q2. Then, could I think that different number of variants between mutect2 and other callers depend only on difference of caller's calling method?

Thanks.

-Stella

Gather task proceeded without awaiting input from scatter output

$
0
0
Dear all,

I'm trying to incorporate the CNN filtering in my Germline variant discovery work flow but noted some abnormal behaviour of Cromwell. My scatter is followed by a gather task (MergeVCF_HC4) with an array of input files supplied by the output of the previous task(CNNScoreVariants).

However, Cromwell is going straight to MergeVCF_HC4 without going through the scatter task and hence producing an error that the input of MergeVCF_HC4 is not found.

My code as follows

#CNN filtering
call SplitIntervals {
input:
intervals = calling_interval_list,
ref_fasta = ref_fasta_extracted,
ref_dict = ref_dict,
ref_fai = ref_fasta_index
}

scatter (calling_interval in SplitIntervals.interval_files) {
call RunHC4 {
input:
input_bam = GatherSortedBamFiles.output_bam,
input_bam_index = GatherSortedBamFiles.output_bam_index,
reference_fasta = ref_fasta_extracted,
reference_dict = ref_dict,
reference_fasta_index = ref_fasta_index,
output_prefix = batch_base_file_name,
interval_list = calling_interval
}

call CNNScoreVariants {
input:
input_vcf = RunHC4.raw_vcf,
input_vcf_index = RunHC4.raw_vcf_index,
bam_file = RunHC4.bamout,
bam_file_index = RunHC4.bamout_index,
reference_fasta = ref_fasta_extracted,
reference_dict = ref_dict,
reference_fasta_index = ref_fasta_index,
output_prefix = batch_base_file_name,
interval_list = calling_interval
}
}

call MergeVCF_HC4 {
input:
input_vcfs = CNNScoreVariants.cnn_annotated_vcf,
input_vcfs_indexes = CNNScoreVariants.cnn_annotated_vcf_index,
output_prefix = batch_base_file_name
}

call FilterVariantTranches {
input:
input_vcf = MergeVCF_HC4.merged_vcf,
input_vcf_index = MergeVCF_HC4.merged_vcf_index,
hapmap = hapmap,
hapmap_idx = hapmap_idx,
mills = mills,
mills_idx = mills_idx,
output_prefix = batch_base_file_name
}

task CNNScoreVariants {
File input_vcf
File input_vcf_index
File reference_fasta
File reference_dict
File reference_fasta_index
String output_prefix
File bam_file
File bam_file_index
File? architecture_json
File? architecture_hd5
File interval_list

command {

gatk --java-options -Xmx60G \
CNNScoreVariants \
${"-I " + bam_file} \
-R ${reference_fasta} \
-V ${input_vcf} \
-O ${output_prefix}_cnn_annotated.vcf.gz \
-L ${interval_list} \
--tensor-type read_tensor \
--inference-batch-size 8 \
--transfer-batch-size 32

}
output {
Array[File] log = glob("gatkStreamingProcessJournal*")
File cnn_annotated_vcf = "${output_prefix}_cnn_annotated.vcf.gz"
File cnn_annotated_vcf_index = "${output_prefix}_cnn_annotated.vcf.gz.tbi"
}
}

task MergeVCF_HC4 {
Array[File] input_vcfs
Array[File] input_vcfs_indexes
String output_prefix
String output_vcf = "${output_prefix}_cnn_scored.vcf.gz"

command {
gatk --java-options -Xmx60 MergeVcfs \
-I ${sep=' -I ' input_vcfs} -O "${output_vcf}"
}

output {
File merged_vcf = "${output_vcf}"
File merged_vcf_index = "${output_vcf}.tbi"
}
}

My error as follows.

[2019-02-27 12:20:53,67] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.SplitIntervals:NA:1]: job id: 13360
[2019-02-27 12:20:53,67] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.CreateSequenceGroupingTSV:NA:1]: job id: 13366
[2019-02-27 12:20:53,67] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.CollectQualityYieldMetrics:0:1]: job id: 13512
[2019-02-27 12:20:53,68] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.SamToFastqAndBwaMemAndMba:0:1]: job id: 13570
[2019-02-27 12:20:53,68] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.ScatterIntervalList:NA:1]: job id: 13362
[2019-02-27 12:20:53,68] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.CollectQualityYieldMetrics:0:1]: Status change from - to WaitingForReturnCode
[2019-02-27 12:20:53,69] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.SamToFastqAndBwaMemAndMba:0:1]: Status change from - to WaitingForReturnCode
[2019-02-27 12:20:53,69] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.CreateSequenceGroupingTSV:NA:1]: Status change from - to Done
[2019-02-27 12:20:53,69] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.ScatterIntervalList:NA:1]: Status change from - to Done
[2019-02-27 12:20:53,69] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.SplitIntervals:NA:1]: Status change from - to WaitingForReturnCode
[2019-02-27 12:20:55,12] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.CollectQualityYieldMetrics:0:1]: Status change from WaitingForReturnCode to Done
[2019-02-27 12:20:55,33] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.SplitIntervals:NA:1]: Status change from WaitingForReturnCode to Done
[2019-02-27 12:21:00,43] [info] WorkflowExecutionActor-180317e4-cbfd-4e94-97ab-0aeb4bbb4943 [180317e4]: Starting RBCWorkflow.MergeVCF_HC4
[2019-02-27 12:21:00,72] [info] Assigned new job execution tokens to the following groups: 180317e4: 1
[2019-02-27 12:21:00,74] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.MergeVCF_HC4:NA:1]: gatk --java-options -Xmx60 MergeVcfs \
-I -O "RBC1_BATCH1_cnn_scored.vcf.gz"
[2019-02-27 12:21:00,75] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.MergeVCF_HC4:NA:1]: executing: /bin/bash /mnt/operation/RedCellNGS/cromwell-executions/RBCWorkflow/180317e4-cbfd-4e94-97ab-0aeb4bbb4943/call-MergeVCF_HC4/execution/script
[2019-02-27 12:21:03,66] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.MergeVCF_HC4:NA:1]: job id: 13780
[2019-02-27 12:21:03,67] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.MergeVCF_HC4:NA:1]: Status change from - to Done
[2019-02-27 12:21:06,10] [info] BackgroundConfigAsyncJobExecutionActor [180317e4RBCWorkflow.SamToFastqAndBwaMemAndMba:0:1]: Status change from WaitingForReturnCode to Done
[2019-02-27 12:21:06,59] [error] WorkflowManagerActor Workflow 180317e4-cbfd-4e94-97ab-0aeb4bbb4943 failed (during ExecutingWorkflowState): Job RBCWorkflow.MergeVCF_HC4:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /mnt/operation/RedCellNGS/cromwell-executions/RBCWorkflow/180317e4-cbfd-4e94-97ab-0aeb4bbb4943/call-MergeVCF_HC4/execution/stderr.
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/mnt/operation/RedCellNGS/cromwell-executions/RBCWorkflow/180317e4-cbfd-4e94-97ab-0aeb4bbb4943/call-MergeVCF_HC4/tmp.e07d0773
Using GATK jar /home/nelson/miniconda3/envs/bioinfo/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx60 -jar /home/nelson/miniconda3/envs/bioinfo/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar MergeVcfs -I -O RBC1_BATCH1_cnn_scored.vcf.gz

[2019-02-27 12:21:06,59] [info] WorkflowManagerActor WorkflowActor-180317e4-cbfd-4e94-97ab-0aeb4bbb4943 is in a terminal state: WorkflowFailedState
[2019-02-27 12:21:12,03] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2019-02-27 12:21:13,66] [info] Workflow polling stopped
[2019-02-27 12:21:13,68] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2019-02-27 12:21:13,69] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2019-02-27 12:21:13,69] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2019-02-27 12:21:13,69] [info] JobExecutionTokenDispenser stopped
[2019-02-27 12:21:13,70] [info] Aborting all running workflows.
[2019-02-27 12:21:13,70] [info] WorkflowStoreActor stopped
[2019-02-27 12:21:13,70] [info] WorkflowLogCopyRouter stopped
[2019-02-27 12:21:13,70] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2019-02-27 12:21:13,71] [info] WorkflowManagerActor All workflows finished
[2019-02-27 12:21:13,71] [info] WorkflowManagerActor stopped
[2019-02-27 12:21:13,94] [info] Connection pools shut down
[2019-02-27 12:21:13,95] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2019-02-27 12:21:13,95] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2019-02-27 12:21:13,95] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2019-02-27 12:21:13,95] [info] SubWorkflowStoreActor stopped
[2019-02-27 12:21:13,95] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2019-02-27 12:21:13,95] [info] JobStoreActor stopped
[2019-02-27 12:21:13,95] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2019-02-27 12:21:13,95] [info] KvWriteActor Shutting down: 0 queued messages to process
[2019-02-27 12:21:13,95] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2019-02-27 12:21:13,95] [info] CallCacheWriteActor stopped
[2019-02-27 12:21:13,95] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2019-02-27 12:21:13,95] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2019-02-27 12:21:13,96] [info] ServiceRegistryActor stopped
[2019-02-27 12:21:13,96] [info] IoProxy stopped
[2019-02-27 12:21:13,96] [info] DockerHashActor stopped
[2019-02-27 12:21:13,99] [info] Database closed
[2019-02-27 12:21:13,99] [info] Stream materializer shut down
[2019-02-27 12:21:13,99] [info] WDL HTTP import resolver closed
[2019-02-27 12:21:14,00] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2019-02-27 12:21:14,00] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2019-02-27 12:21:14,00] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
Workflow 180317e4-cbfd-4e94-97ab-0aeb4bbb4943 transitioned to state Failed

split multiallelic variants before VQSR and CNNScoreVariants, gatk team opinion

$
0
0

Hi,

if I remember well I saw in the forum a user to suggest to split the multiallelic variants before VQSR. I think that is something logical (I never done before), but I would like to know the opinion of the GATK team considering that the splitted-multiallelic variants (together with the other variants), at least in my case, should be used as input in the VQSR and the CNN steps.

Many thanks

Funcotator log4j Error

$
0
0

I am running Funcotator on a full VCF resulting in an empty VCF. I think it is due to an error in the log4j system "No appenders could be found for logger". I have seen this issue reported by other users for other tools but I haven't found a solution. Any idea what is going wrong?

Input:

gatk Funcotator \
-V input.vcf \
-O output.vcf \
-R ucsc.hg19.fasta \
--output-file-format VCF \
--ref-version hg19 \
--data-sources-path ~/Documents/Funcotator_DataSource

Summary:

12:30:56.906 INFO  Funcotator - ------------------------------------------------------------
12:30:56.906 INFO  Funcotator - The Genome Analysis Toolkit (GATK) v4.0.4.0-0.0.2
12:30:56.906 INFO  Funcotator - For support and documentation go to https://software.broadinstitute.org/gatk/
12:30:56.906 INFO  Funcotator - Executing as das106@Snellings-Bioinformatics on Linux v4.15.0-20-generic amd64
12:30:56.906 INFO  Funcotator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_121-b15
12:30:56.906 INFO  Funcotator - Start Date/Time: May 29, 2018 12:30:56 PM EDT
12:30:56.906 INFO  Funcotator - ------------------------------------------------------------
12:30:56.906 INFO  Funcotator - ------------------------------------------------------------
12:30:56.906 INFO  Funcotator - HTSJDK Version: 2.14.3
12:30:56.907 INFO  Funcotator - Picard Version: 2.18.2
12:30:56.907 INFO  Funcotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:30:56.907 INFO  Funcotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:30:56.907 INFO  Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:30:56.907 INFO  Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:30:56.907 INFO  Funcotator - Deflater: IntelDeflater
12:30:56.907 INFO  Funcotator - Inflater: IntelInflater
12:30:56.907 INFO  Funcotator - GCS max retries/reopens: 20
12:30:56.907 INFO  Funcotator - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
12:30:56.907 WARN  Funcotator - 

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

   Warning: Funcotator is a BETA tool and is not yet ready for use in production

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


12:30:56.907 INFO  Funcotator - Initializing engine
12:30:57.099 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/das106/Desktop/Data/SeqData/CCM_01_04/Variants/CCM_01_04_Mutect2_dbsnp.vcf
12:30:57.113 INFO  Funcotator - Done initializing engine
log4j:WARN No appenders could be found for logger (org.broadinstitute.hellbender.tools.funcotator.dataSources.DataSourceUtils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
12:30:57.122 INFO  FeatureManager - Using codec GencodeGtfCodec to read file file:///home/das106/Documents/Funcotator_DataSource/gencode/hg19/gencode.v19.chr_patch_hapl_scaff.annotation.REORDERED.gtf
12:30:57.125 INFO  FeatureManager - Using codec XsvLocatableTableCodec to read file file:///home/das106/Documents/Funcotator_DataSource/clinvar/hg19/clinvar_hgmd.tsv
12:30:57.170 INFO  FeatureManager - Using codec XsvLocatableTableCodec to read file file:///home/das106/Documents/Funcotator_DataSource/oreganno/hg19/oreganno.tsv
12:30:57.182 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/das106/Documents/Funcotator_DataSource/dbsnp/hg19/hg19_All_20170710.vcf.gz
WARNING 2018-05-29 12:30:59 AsciiLineReader Creating an indexable source for an AsciiFeatureCodec using a stream that is neither a PositionalBufferedStream nor a BlockCompressedInputStream
WARNING 2018-05-29 12:30:59 AsciiLineReader Creating an indexable source for an AsciiFeatureCodec using a stream that is neither a PositionalBufferedStream nor a BlockCompressedInputStream
12:30:59.409 INFO  Funcotator - Shutting down engine
[May 29, 2018 12:30:59 PM EDT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=2258632704
java.lang.IllegalArgumentException: URI is not hierarchical
    at java.io.File.<init>(File.java:418)
    at org.broadinstitute.hellbender.tools.funcotator.FuncotatorUtils.initializeB37SequenceDict(FuncotatorUtils.java:1926)
    at org.broadinstitute.hellbender.tools.funcotator.FuncotatorUtils.isSequenceDictionaryUsingB37Reference(FuncotatorUtils.java:1693)
    at org.broadinstitute.hellbender.tools.funcotator.Funcotator.onTraversalStart(Funcotator.java:374)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:890)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

--dontUseSoftClippedBases option seems to involve non-softclipped bases

$
0
0
Hello,

I am checking the differences between the calls with/without using soft clipped bases.
Some of the missed calls when '--dontUseSoftClippedBases=true' (compared with the calls when 'false') harbored in the middle of the reads, not in the soft-clipped bases.
Why the calls were missed?
Many thanks!!

FilterMutectCalls error :"there is no such column: sample"

$
0
0

Hi,

I've been following the best practice for tumor somatic mutation calling.

Everything runs like a charm until FilterMutectCalls which keeps throwing a java error:
"java.lang.IllegalArgumentException: there is no such column: sample"

This is WES (on PDX) with paired normal sample, using GATK4.1.
ValidateVariants went perfectly well.

Any idea? Am I missing something?

Thank you in advance.

Kind regards.

Here is the FilterMutectCalls cmd and log:

  java -jar gatk-package-4.1.0.0-local.jar FilterMutectCalls -V Sample_PDAC_JIA_HS_003T_DNA.vcf --contamination-table Sample_PDAC_JIA_HS_003T_DNA_calculatecontamination.table -O Sample_PDAC_JIA_HS_003T_DNA_oncefiltered.vcf
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /datacit/03_TOOLS/SeqTools/GATK/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar FilterMutectCalls -V Sample_PDAC_JIA_HS_003T_DNA.vcf --contamination-table Sample_PDAC_JIA_HS_003T_DNA_calculatecontamination.table -O Sample_PDAC_JIA_HS_003T_DNA_oncefiltered.vcf
21:20:09.207 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/datacit/03_TOOLS/SeqTools/GATK/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
21:20:19.585 INFO  FilterMutectCalls - ------------------------------------------------------------
21:20:19.586 INFO  FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.1.0.0
21:20:19.586 INFO  FilterMutectCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
21:20:19.587 INFO  FilterMutectCalls - Executing as cit@SVL001 on Linux v3.10.0-693.17.1.el7.x86_64 amd64
21:20:19.587 INFO  FilterMutectCalls - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-b12
21:20:19.588 INFO  FilterMutectCalls - Start Date/Time: 27 février 2019 21:20:09 CET
21:20:19.589 INFO  FilterMutectCalls - ------------------------------------------------------------
21:20:19.589 INFO  FilterMutectCalls - ------------------------------------------------------------
21:20:19.590 INFO  FilterMutectCalls - HTSJDK Version: 2.18.2
21:20:19.591 INFO  FilterMutectCalls - Picard Version: 2.18.25
21:20:19.591 INFO  FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
21:20:19.591 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:20:19.592 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:20:19.592 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:20:19.593 INFO  FilterMutectCalls - Deflater: IntelDeflater
21:20:19.593 INFO  FilterMutectCalls - Inflater: IntelInflater
21:20:19.593 INFO  FilterMutectCalls - GCS max retries/reopens: 20
21:20:19.594 INFO  FilterMutectCalls - Requester pays: disabled
21:20:19.594 INFO  FilterMutectCalls - Initializing engine
21:20:19.941 INFO  FeatureManager - Using codec VCFCodec to read file file:///datacompute/pacaomics/04_Processed/Variant_calling/Mutect2/Sample_PDAC_JIA_HS_003T_DNA/Sample_PDAC_JIA_HS_003T_DNA.vcf
21:20:20.028 INFO  FilterMutectCalls - Done initializing engine
21:20:20.131 INFO  FilterMutectCalls - Shutting down engine
[27 février 2019 21:20:20 CET] org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=2175795200
java.lang.IllegalArgumentException: there is no such column: sample
    at org.broadinstitute.hellbender.utils.tsv.DataLine.columnIndex(DataLine.java:431)
    at org.broadinstitute.hellbender.utils.tsv.DataLine.get(DataLine.java:400)
    at org.broadinstitute.hellbender.utils.tsv.DataLine.get(DataLine.java:529)
[...]

Here's the mutect2 cmd :

 java gatk-package-4.1.0.0-local.jar Mutect2 -R ensembl91.GRC38_hgmmu_chrename.fa -I Sample_PDAC_JIA_HS_003T_DNA.nochrFilt.reord.sorted.dedup.recal.bam -I Sample_PDAC_JIA_HS_003N_DNA.nochrFilt.reord.sorted.dedup.recal.bam -tumor Sample_PDAC_JIA_HS_003T_DNA -normal Sample_PDAC_JIA_HS_003N_DNA -pon mutect2_pon.vcf --germline-resource somatic-hg38_af-only-gnomad.hg38_withoutContig_hschr.vcf --af-of-alleles-not-in-resource 0.0000025 --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter -O Sample_PDAC_JIA_HS_003T_DNA.vcf -bamout Sample_PDAC_JIA_HS_003T_DNA_Sample_PDAC_JIA_HS_003N_DNA.bam --tmp-dir /datacompute/pacaomics/tmpdir


How to find HaplotypeScore?

$
0
0

I was running haplotype score and got a warning message :Annotation will not be calculated, must be called from UnifiedGenotyper.

Can you please tell the command to calculate Haplotype score using Unifiedgenotyper.
I have seen a paramter -A (for annotaion but not not what are the variables that can be passed in it)

Could I run ASEReadCounter on homozygous SNPs?

$
0
0

The documentation of ASEReadCounter states that this tool is designed for heterozygous SNPs. However, could I still use it to calculate ref and alt allele read depth on hom-SNPs? My purpose is to check consistency between RNA-Seq and DNA-Seq samples from the same individuals, in order to identify potential contamination in RNA samples.

SelectVariants V4 TribbleException Contig chr1 does not have a length field

$
0
0

I indexed my VCF file with GATK V4.0.6.0 IndexFeatureFile, then ran GATK V4.0.6.0 SelectVariants on it, and I got an exception:

htsjdk.tribble.TribbleException: Contig chr1 does not have a length field.

When I run the same VCF using GATK V3 SelectVariants, it works.

As far as I know, ##contig entries in the VCF header should NOT have a length in them.

Combine gvcf files generated with GATK and Isaac

$
0
0

Hi. I have a batch of g.vcf files generated with IsaacVariantCaller. I also have another batch generated with GATK HaplotypeCaller. I want to combine these to batches to perform GenotypeGVCFs. Is there any way to combine these two batches? I do not have BAM files of the samples generated with Isaac, so I can't start from that step. Thank you in advance.

Error with FastaAlternateReferenceMaker

$
0
0

Hello,

I am having some trouble running the FastaAlternateReferenceMaker tool to convert my vcf sequences to fasta using a reference genome. I started with a multi-sequence vcf made from whole genome paired-end Illumina data. I then subset the the larger vcf file to isolate a single gene region and further subset it to only include organisms from one population. I was able to troubleshoot several issues but there seems to be something I am missing. I am no longer getting a clear error message as I was before, the message now is mostly incomprehensible except for one line which says
"htsjdk.tribble.TribbleException: Contig CAE1 does not have a length field.
at htsjdk.variant.vcf.VCFContigHeaderLine.getSAMSequenceRecord(VCFContigHeaderLine.java:80)
at htsjdk.variant.vcf.VCFHeader.getSequenceDictionary(VCFHeader.java:206)

it then goes on to list many more. I tried to use but that did not seem to work either
java -jar picard.jar FixVcfHeader -I VBOA1003.vcf.gz -O VBOA1003_Fixed.vcf.g
I could use some advice

Best,

Christian

[GATK 4.0.0.0] joint calling for Mutect2?

$
0
0

Hello,

I am interested in inferring clonal evolution using somatic variants called by Mutect2. One way to infer is by tracking VAF (variant allele fraction) of somatic variants in multiple time points and clustering.
One challenge in using Mutect2 calls is its difficulty to compute VAFs especially for indel, because some variants are called using local assembly in a subset of time points. Allele counting in time points where the variant is not called is tricky. Thus, I usually limit variants to SNPs which is less hard to count. But some cohorts don't carry many somatic variants and I believe it would be helped by joint calling. Does it make sense?
Would you consider implementing joint calling for Mutect2 like Haplotypecaller?


(Image credit: https://github.com/chrisamiller/fishplot)

CNNScoreVariants Hanging in 4.1.0

$
0
0
I am trying to run CNNScoreVariants in GATK 4.1.0 but the tool seems to hang on the 'INFO NativeLibraryLoader - Loading libgkl_utils.so from jar' step for both the 1D and 2D models.

My issues seems similar to this post, but the hang up occurs at a different location:
(gatkforums.broadinstitute.org/gatk/discussion/12384/cnnscorevariants-hanging-in-4-0-5-2-and-4-0-6-0)

I have tried the accepted answer in the above post without success. Any help would be appreciated.

how to do BQSR on WES or other targeted sequencing?

$
0
0

I have this question because of the following statement from this article:

In addition, there are some processing steps, such as BQSR, that should be restricted to the capture targets in order to eliminate off-target sequencing data, which is uninformative and is a source of noise.

My understanding is that it says the bias in BQS is different between on and off targets, i.e. the recalibration models should be different between on and off target bases. If that's correct, seems to me there are two proper ways of doing BQSR in WES, neither of which is in the Broad Official WES pipeline.
1. Keep variants only in the targets, i.e. apply -L targets.bed -ip 100 at both steps of BQSR.
2. Keep variants in both on- and off-targets.
2.1. apply -L targets.bed at both steps of BQSR
2.2 apply -XL targets.bed at both steps of BQSR
2.3 merge the two bam files from 2.1 and 2.2

Just wonder if anyone can comment?

GATK4.1.0.0's VariantRecalibrator resource format differs from GATK4.0.11.0?

$
0
0

Hi,
When I used GATK4.1.0.0 to vqsr, I found it reported the error, but I tried the beta version GATK4.0.11.0, it ran well.
And the input VCF file I used was produced by the old tool CombineGVCFs and GenotypeGVCFs with gvcfs. I do not use GenomicsDBImport, because it is really slow for me.
gatk4.0.11.0

~/gatk-4.0.11.0/gatk --java-options '-Xmx20G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true' VariantRecalibrator \
-R ~/database/hg19/ucsc.hg19.fasta \
-V ~/test.raw.vcf \
-resource hapmap,known=false,training=true,truth=true,prior=15.0:~/GATK_hg19/hapmap_3.3.hg19.sites.vcf \
-an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum \
-mode SNP \
-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
-O out/recalibrate_SNP.recal \
--tranches-file out/test.recalibrate_SNP.tranches \
--rscript-file out/test.recalibrate_SNP_plots.R

gatk4.1.0.0

~/gatk-4.0.11.0/gatk --java-options '-Xmx20G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true' VariantRecalibrator \
-R ~/database/hg19/ucsc.hg19.fasta \
-V ~/test.raw.vcf \
-resource hapmap,known=false,training=true,truth=true,prior=15.0:~/GATK_hg19/hapmap_3.3.hg19.sites.vcf \
-an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum \
-mode SNP \
-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
-O out/recalibrate_SNP.recal \
--tranches-file out/test.recalibrate_SNP.tranches \
--rscript-file out/test.recalibrate_SNP_plots.R

Error:

A USER ERROR has occurred: Couldn't read file file:/// ./out/code/hapmap,known=false,training=true,truth=true,prior=15.0:~/GATK_hg19/hapmap_3.3.hg19.sites.vcf. Error was: It doesn't exist.

But the resource file really exists here. I don't know how to resolve it, or GATK4.1.0.0 dose not support the the old combine gvcfs way any longer?

FastaAlternateReferenceMakers 4.1.0.0 doesn't give expected output

$
0
0

Hi GATK team,
I run this line:
./gatk FastaAlternateReferenceMakers -R Galaxy47-[ARS-UCD1.2_Btau5.0.1Y.fa.gz].fasta -O Chr20a.SRR4296972.fasta -V Chr20.SRR4296972.g.vcf.gz
But, the tool doesn't change the output fasta based on the vcf file provided.
Here are the screenshots from IGV:
.
The upper part is the BAM file where the vcf coming from, where the below part is the output fasta of FARM.
This variant and with other variants are listed in the vcf file, seen here

hope to hear the solution from you soon,
Thank you,
masagis

How are supplementary alignments handled by HaplotypeCaller?

$
0
0

Hello,

I am curious about how HaplotypeCallers deals with secondary alignments. I know (based on the documentation and my own runs) that the filter NotSecondaryAlignmentReadFilter is applied but I do not see the filter NotSupplementaryAlignmentReadFilter (is it obsolete?, redundant?, hast it been renamed or merged with another filter maybe?)

Thanks in advance!

Java errors with GATK when running on rna-seq data in serial mode using GCC

$
0
0

Hi,

I am using GATK 3.8. I had done some analysis with DNAseq data that went seamlessly but now I am working on calling variants using RNA-seq. I had run everything till the split-reads part but then I skipped indel realign and ran baserecalibration directly.

Here is my sbatch command

sbatch --partition=BioCompute --nodes=1 --ntasks=1 --cpus-per-task=1 --mem=60G --qos=normal --time=02-00:00:00 --output=Base_recalibrator-%j.out --mail-user=smk5g5@missouri.edu --mail-type=END,FAIL --wrap="java -Xmx50g -XX:+UseSerialGC -jar /cluster/software/gatk/gatk-3.8/GenomeAnalysisTK.jar -T BaseRecalibrator -R ../Cancer_exomes/genome.fa -I MDAMB436_RNA-seq_SRR1639744_RNAseqAligned_split.bam -knownSites ../Cancer_exomes/b37/dbsnp_138.b37.excluding_sites_after_129.vcf -knownSites ../Cancer_exomes/b37/hapmap_3.3.b37.vcf -knownSites ../Cancer_exomes/b37/dbsnp_138.b37.vcf -knownSites ../Cancer_exomes/b37/1000G_omni2.5.b37.vcf -knownSites ../Cancer_exomes/b37/1000G_phase1.snps.high_confidence.b37.vcf -knownSites ../Cancer_exomes/b37/CEUTrio.HiSeq.WGS.b37.bestPractices.b37.vcf -knownSites ../Cancer_exomes/b37/NA12878.knowledgebase.snapshot.20131119.b37.vcf -knownSites ../Cancer_exomes/b37/CEUTrio.HiSeq.WGS.b37.NA12878.vcf -knownSites ../Cancer_exomes/b37/1000G_phase3_v4_20130502.sites.vcf -knownSites ../Cancer_exomes/b37/NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.b37.sites.vcf -knownSites ../Cancer_exomes/b37/NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.b37.vcf -o MDAMB436_RNA-seq_SRR1639744_RNAseqAligned_recal.table"

As you can see I am using 50 gb ram in Java -Xmx argument but still it is giving me the following error.

ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can u
se the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that thi
s is a JVM argument, not a GATK argument.

I was wondering if anyone can help me sort this problem out.

Thanks
Saad

Viewing all 12345 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>