Should I make PON although having Matched normal samples? PLEASE LET ME KNOW (..)

May 29, 2018, 7:38 pm

≫ Next: Getting null point exception when running BaseRecalibrator

≪ Previous: ReadBackedPhasing missed calls

I have normal-tumor matched samples for 30 individuals.
but Every documents i saw says 'Making PON is a very important process'.

Although i have matched normal samples with every tumor samples, should i make pon using Mutect tumor only mode?

ps. I currently use GATK3. And want to compare GATK3 and GATK4 in the view of result.

↧

Getting null point exception when running BaseRecalibrator

May 30, 2018, 1:43 am

≫ Next: Can not download reference files for running Pathseq pipeline.

≪ Previous: Should I make PON although having Matched normal samples? PLEASE LET ME KNOW (..)

Hi all, I am running this command in 4.0.3.0:

java -jar gatk-package.jar BaseRecalibrator -R RefGenomes/Felis_catus_new8.fna -I DS05061.bam -O DS05061.table --known-sites RefGenomes/felis_catus.vcf

and am getting this error:
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
04:34:40.657 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/users/cawthamy/gatk-package.jar!/com/intel/gkl/native/libgkl_compression.so
04:34:46.444 INFO BaseRecalibrator - ------------------------------------------------------------
04:34:46.445 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.0.3.0
04:34:46.445 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
04:34:46.446 INFO BaseRecalibrator - Executing as cawthamy@hpcmem01 on Linux v3.10.0-693.11.6.el7.x86_64 amd64
04:34:46.446 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_151-b12
04:34:46.446 INFO BaseRecalibrator - Start Date/Time: May 30, 2018 4:34:40 AM EDT
04:34:46.446 INFO BaseRecalibrator - ------------------------------------------------------------
04:34:46.446 INFO BaseRecalibrator - ------------------------------------------------------------
04:34:46.447 INFO BaseRecalibrator - HTSJDK Version: 2.14.3
04:34:46.447 INFO BaseRecalibrator - Picard Version: 2.17.2
04:34:46.447 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
04:34:46.448 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
04:34:46.448 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
04:34:46.448 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
04:34:46.448 INFO BaseRecalibrator - Deflater: IntelDeflater
04:34:46.448 INFO BaseRecalibrator - Inflater: IntelInflater
04:34:46.448 INFO BaseRecalibrator - GCS max retries/reopens: 20
04:34:46.448 INFO BaseRecalibrator - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
04:34:46.448 INFO BaseRecalibrator - Initializing engine
04:34:53.427 INFO FeatureManager - Using codec VCFCodec to read file file:///home/users/cawthamy/RefGenomes/felis_catus.vcf
04:34:53.450 INFO BaseRecalibrator - Shutting down engine
[May 30, 2018 4:34:53 AM EDT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.21 minutes.
Runtime.totalMemory()=2913992704
java.lang.NullPointerException
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.getContigNames(SequenceDictionaryUtils.java:463)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.getCommonContigsByName(SequenceDictionaryUtils.java:457)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.compareDictionaries(SequenceDictionaryUtils.java:234)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:150)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:98)
at org.broadinstitute.hellbender.engine.GATKTool.validateSequenceDictionaries(GATKTool.java:621)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:563)
at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:55)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:132)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

how can I fix this?

↧

Can not download reference files for running Pathseq pipeline.

May 30, 2018, 2:13 am

≫ Next: CNNScoreVariants no suitable codecs found erro

≪ Previous: Getting null point exception when running BaseRecalibrator

Hi GATKer:

I am trying to download the pre-built reference files for running Pathseq pipeline from GATK Resource Bundle FTP server in /bundle/beta/PathSeq/. But it fails with some permission denies for data transferring. Could you double-check the visiting privilege of files under the path /bundle/beta/PathSeq/.

Thank you in advance

Sen ZHAO

↧

CNNScoreVariants no suitable codecs found erro

May 30, 2018, 4:33 am

≫ Next: Calling Somatic Variants without matched normals using GATK.

≪ Previous: Can not download reference files for running Pathseq pipeline.

When first testing the CNNScoreVariants, the following error occurs. I have tried this on several bam/cram files that result in the same error. These cram have run fine with other GATK tools. As a pre-req, I did run pip install vqsr_cnn.

Any workarounds for this?

Using GATK jar /share/pkg/gatk/4.0.3.0/install/bin/gatk-package-4.0.3.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /share/pkg/gatk/4.0.3.0/install/bin/gatk-package-4.0.3.0-local.jar CNNScoreVariants -V /restricted/projectnb/casa/wgs.hg38/adni/cram/ADNI_016_s_4584.hg38.realign.bqsr.cram -R /restricted/projectnb/casa/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa -O cnn./restricted/projectnb/casa/wgs.hg38/adni/cram/ADNI_016_s_4584.hg38.realign.bqsr.cram
07:24:39.726 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/pkg/gatk/4.0.3.0/install/bin/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
07:24:39.869 INFO  CNNScoreVariants - ------------------------------------------------------------
07:24:39.869 INFO  CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.0.3.0
07:24:39.870 INFO  CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
07:24:39.870 INFO  CNNScoreVariants - Executing as farrell@scc-hadoop.bu.edu on Linux v2.6.32-696.28.1.el6.x86_64 amd64
07:24:39.870 INFO  CNNScoreVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_151-b12
07:24:39.870 INFO  CNNScoreVariants - Start Date/Time: May 30, 2018 7:24:39 AM EDT
07:24:39.870 INFO  CNNScoreVariants - ------------------------------------------------------------
07:24:39.870 INFO  CNNScoreVariants - ------------------------------------------------------------
07:24:39.871 INFO  CNNScoreVariants - HTSJDK Version: 2.14.3
07:24:39.871 INFO  CNNScoreVariants - Picard Version: 2.17.2
07:24:39.871 INFO  CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
07:24:39.871 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
07:24:39.871 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
07:24:39.871 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
07:24:39.872 INFO  CNNScoreVariants - Deflater: IntelDeflater
07:24:39.872 INFO  CNNScoreVariants - Inflater: IntelInflater
07:24:39.872 INFO  CNNScoreVariants - GCS max retries/reopens: 20
07:24:39.872 INFO  CNNScoreVariants - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
07:24:39.872 WARN  CNNScoreVariants -

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

   Warning: CNNScoreVariants is an EXPERIMENTAL tool and should not be used for production

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


07:24:39.872 INFO  CNNScoreVariants - Initializing engine
07:24:40.566 INFO  CNNScoreVariants - Shutting down engine
[May 30, 2018 7:24:40 AM EDT] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1829240832
***********************************************************************

A USER ERROR has occurred: Cannot read /restricted/projectnb/casa/wgs.hg38/adni/cram/ADNI_016_s_4584.hg38.realign.bqsr.cram because no suitable codecs found

***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$NoSuitableCodecs: Cannot read /restricted/projectnb/casa/wgs.hg38/adni/cram/ADNI_016_s_4584.hg38.realign.bqsr.cram because no suitable codecs found
        at org.broadinstitute.hellbender.engine.FeatureManager.getCodecForFile(FeatureManager.java:436)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getCodecForFeatureInput(FeatureDataSource.java:327)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:307)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:255)
        at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:55)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:47)
        at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:558)
        at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:43)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:132)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

↧

Calling Somatic Variants without matched normals using GATK.

May 30, 2018, 7:22 am

≫ Next: Algorithm question for VQSR

≪ Previous: CNNScoreVariants no suitable codecs found erro

Hi ,
I am new to the world of bioinformatics. I currently have sequencing data (WES) of about 45 pediatric brain tumor samples (archived FFPE), I am keen on identifying mutational burden and mutational signatures in these samples. I don't necessarily
want to discover a novel mutation and describe it's biological relevance. More use the pattern of mutational signatures to identify the causes of recurrence in tumors. The problem is like with most archived FFPE samples I don't have matched normal tissue. I am looking at the best approach to call somatic variants in these samples. Is Using gnomAD for filtering my best option? Is that a good resource for pediatric tumors? If no then what could be other potential sources for this.

Thank you for your advise.
Aditi

↧

Algorithm question for VQSR

May 30, 2018, 8:29 am

≫ Next: Picard liftoverVCF error when attempting to liftover to build 38

≪ Previous: Calling Somatic Variants without matched normals using GATK.

As for as I understand, VQSR selects a pool of SNP existing in both testing set and know annotated SNP database. These SNP will be considered as true variants and a Gaussian mixture model is established based on the features of these true variant to classify additional SNP.

These true SNPs will be clustered using Gaussian model. However, Gaussian mixture model means we are also cluster "bad" SNPs as well. I imagine that these "bad" SNPs have different poor qualities on different direction and the finally the Gaussian mixture model will make multiple clusters (one true SNP cluster and multiple bad SNP clusters), right?

Then Why can't we just use a simple Gaussian model to just draw distribution of true SNP and any SNPs far from this cluster will more likely to be false?

↧

Picard liftoverVCF error when attempting to liftover to build 38

January 16, 2018, 4:20 am

≫ Next: Filtering VCF file to remove ./.

≪ Previous: Algorithm question for VQSR

Hi,

Apologies for the Biostars cross-posting, but haven't received any replies there yet, so was hoping for some help. I've been trying to lift over .vcf files (human) from build 37 to build 38 using Picard's liftoverVCF. After some very helpful tips from BioStars users (reference fasta file has been appropriately indexed, I'm also running the latest version of Picard (2.17.3)), I'm now at the stage where I'm receiving the following error:

java -jar picard.jar LiftoverVcf \ I=Input.vcf \ O=Output.vcf \ CHAIN=GRCh37_to_GRCh38.chain \ REJECT=rejected_variants_chr22.vcf \ R=hg38.fa

11:08:47.603 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/home/Work/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Mon Jan 15 11:08:47 GMT 2018] LiftoverVcf INPUT=Input.vcf OUTPUT=Output.vcf CHAIN=GRCh37_to_GRCh38.chain REJECT=rejected_variants_chr22.vcf REFERENCE_SEQUENCE=hg38.fa WARN_ON_MISSING_CONTIG=false WRITE_ORIGINAL_POSITION=false LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false TAGS_TO_REVERSE=[AF] TAGS_TO_DROP=[MAX_AF] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Mon Jan 15 11:08:47 GMT 2018] Executing as user@node030 on Linux 2.6.32-220.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0-internal-root_2016_01_29_08_23-b00; Deflater: Intel; Inflater: Intel; Picard version: 2.17.3-SNAPSHOT INFO 2018-01-15 11:08:47 LiftoverVcf Loading up the target reference genome. INFO 2018-01-15 11:09:05 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.) ERROR 2018-01-15 11:09:05 LiftoverVcf Encountered a contig, 22 that is not part of the target reference. [Mon Jan 15 11:09:05 GMT 2018] picard.vcf.LiftoverVcf done. Elapsed time: 0.29 minutes. Runtime.totalMemory()=3293052928

Looking up this error, I see some suggestions to add "chr" to the chromosome notation in the .vcf file, to match the reference and chain files. I have done this as follows:

awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' Input.vcf > Input_withchr.vcf

but am still receiving the same error as above. This seems to be the same error as posted here:
https://gatkforums.broadinstitute.org/gatk/discussion/7009/picard-liftovervcf-contig-not-part-of-the-target-reference

but I can't see a solution posted. Any suggestions?

Thanks!

↧

Filtering VCF file to remove ./.

May 30, 2018, 9:32 am

≫ Next: Interpretation GATK BaseRecalibration report

≪ Previous: Picard liftoverVCF error when attempting to liftover to build 38

Hello,
I am trying to understand my sample format in my merged vcf file of RNA-seq SNPs produced from the GATK best practices. I have several vcf files that I have merged into one file using combinevariants. Before the files are merged, the format of each sample is mostly 1/1 and 0/1. I understand what these genotypes mean, but after I merge the files, I end up with lots of SNPs that have the genotype "./." while one of the other samples has1/1. I have been reading through other people's work and it seems like maybe "./" indicates that this SNP did not have a high enough quality for this sample? I want to select from my merged vcf file only those variants that pass the quality for all individual files. Just to clarify, if I had three vcf files merged and one variant in these files had the genotype 1/1, 1/0, and 0/0 respectively, I want to keep that variant. However, if there was a variant with the genotypes 1/1, 1/0, and ./., I don't want to keep that variant. Am I understanding what ./. means correctly? And is there an easy way to remove these variants from my merged file? Thank you very much for your help!
Leigh Ann

↧

Interpretation GATK BaseRecalibration report

May 30, 2018, 10:00 am

≫ Next: combineGVCFs with duplicate sample id?

≪ Previous: Filtering VCF file to remove ./.

I am having some difficulties understanding the plots from the GATK BaseRecalibration report. Is there any guide or tutorial available that could help me making sense of them? Thank you.

↧

combineGVCFs with duplicate sample id?

July 17, 2017, 8:41 am

≫ Next: DiscoverVariantsFromContigAlignmentsSAMSpark error

≪ Previous: Interpretation GATK BaseRecalibration report

I am performing the joint calling workflow on a large batch of samples and I have a handful that were sequenced twice, using two different capture kits. For these, the sample ID in the GVCFs are the same. I am looking for an option like -genotypeMergeOption UNIQUIFY to combineGVCFs that will make the sample names unique. I see that if two GVCFs with the same ID are given to combineGVCFs that the ID is present only once in the resulting combined GVCF header, and if the ID is present in two different combined GVCFs that are given to genotypeGVCF that the ID is only present once in the output. What is the recommended practice here? I would like to avoid rerunning my pipeline again to make the names unique in the single sample GVCF.

↧

DiscoverVariantsFromContigAlignmentsSAMSpark error

May 17, 2018, 2:25 pm

≫ Next: Using VariantRecalibrator over a large dataset.

≪ Previous: combineGVCFs with duplicate sample id?

Hello there!

I am using "DiscoverVariantsFromContigAlignmentsSAMSpark" to call SNPs on contigs from an assembly. The assembly was done by Falcon for Pacbio reads. While running the command, I receive an interval error for the SAM file. In order to run the command successfully, I've already mapped the contigs to the reference genome, using minimap2. I've converted resulting CRAM file to a SAM file, added read group info there and sorted the SAM file. I also ran "ValidateSamFile" successfully. I am running latest version of GATK. Here are the commands and part of the error message:

Commnad:

gatk DiscoverVariantsFromContigAlignmentsSAMSpark -R $ref_genome -I $input_file -O ${input_file}_gatk_Vcalled.vcf

Error:

ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 162)
java.lang.IllegalArgumentException: Invalid interval. Contig:16 start:46400198 end:46400197
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:687)
at org.broadinstitute.hellbender.utils.SimpleInterval.validatePositions(SimpleInterval.java:61)
at org.broadinstitute.hellbender.utils.SimpleInterval.(SimpleInterval.java:37)
at org.broadinstitute.hellbender.tools.spark.sv.discovery.alignment.ContigAlignmentsModifier.splitGappedAlignment(ContigAlignmentsModifier.java:310)
at org.broadinstitute.hellbender.tools.spark.sv.discovery.SvDiscoverFromLocalAssemblyContigAlignmentsSpark$SAMFormattedContigAlignmentParser.lambda

Any help on resolving the issue is appreciated!

↧

Using VariantRecalibrator over a large dataset.

May 30, 2018, 2:50 pm

≫ Next: Does ReadBackedPhasing rely on a VCF's GT field?

≪ Previous: DiscoverVariantsFromContigAlignmentsSAMSpark error

I'm using VariantRecalibrator on about 2TiB of raw SNPs, and I'm hoping there's a way I can build the model iteratively over smaller windows of input files. I'm fine with doing this sequentially/serially, it wouldn't need to be parallelized. My raw VCF files (thousands of them), which I supply as a series of --variant NNNNNNNN.vcf.gz arguments to VariantRecalibrator, are non-overlapping, and are all jointcalled over the same set of samples.

We're using GATK4.0.1.2 (will switch to 4.0.4.0), and we are following best practices. My SNP training set has been computed separately, as I don't have access to high-quality "truth" SNP sets for the plant genome I'm working with (sunflower).

Here is the gist of my VariantRecalibrator call (I have a similar one for INDELs):

JAVA_OPTIONS="-Xmx4g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true"
resname=GOLD
resparams=known=false,training=true,truth=true,prior=10.0
TMPSNPS="${workdir}/gold.snps.vcf.gz"

snp_rscript="snp.recal.Rscript"
snp_recal="snp.recal.vcf.gz"
snp_tranches="snp.tranches"
indel_rscript="indel.recal.Rscript"
indel_recal="indel.recal.vcf.gz"
indel_tranches="indel.tranches"

HOME="${workdir}" /gatk/gatk VariantRecalibrator \
    --java-options "$JAVA_OPTIONS" \
    --TMP_DIR "${XTMP}/gatk" \
    --arguments_file "$argsfile" `:  that is simply a concatenation of all input files` \
    --resource ${resname},${resparams}:"${TMPSNPS}" \
    --mode SNP -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP \
    --truth-sensitivity-tranche 100.0 \
    --truth-sensitivity-tranche  99.0 \
    --truth-sensitivity-tranche  90.0 \
    --truth-sensitivity-tranche  70.0 \
    --truth-sensitivity-tranche  50.0 \
    --rscript-file  "$workdir/${snp_rscript}" \
    --tranches-file "$workdir/${snp_tranches}" \
    --output        "$workdir/${snp_recal}"

The arguments_file simply contains a series of --variant XXX.vcf.gz.

My input VCF files are stored on the NAS inside tarballs (.tar), and I can't have them both in archive form and de-tarred, because it won't fit on the compute node. I'm hoping there is a way to iterate the model recalibration in batches, where on a subsequent batch I pick up the model where I left it off on the previous iteration?

I'm wondering what my options are to eventually iterate over my entire dataset.

Can I feed VariantRecalibrator pipe (fifo) files so I can stream vcf data into it? (the fifo would be backed by a program that would simply concatenate files one after the other on the stream, or just feed tar -xf batch001.tar 0000001.vcf.gz etc.).
Is there a plugin/codec that can read all the vcf.gz files contained in a tarball? Then I could probably leave all the tar files on the NAS/filer.

PS. In the docs, the VariantRecalibrator example mentions --recal-file FOO, but that option is not supported in 4.0.1.2. (https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.1.2/org_broadinstitute_hellbender_tools_walkers_vqsr_VariantRecalibrator.php )

↧

Does ReadBackedPhasing rely on a VCF's GT field?

May 10, 2018, 8:15 pm

≫ Next: VariantRecalibrator parameter question

≪ Previous: Using VariantRecalibrator over a large dataset.

I have a VCF file that is missing the GT field. Can I just add 0/1 for each variant, and let GATK's ReadBackedPhasing take care of resolving the actual phased genotypes?

↧

VariantRecalibrator parameter question

May 30, 2018, 4:14 pm

≫ Next: Spark

≪ Previous: Does ReadBackedPhasing rely on a VCF's GT field?

For the VariantRecalibrator program, there is an option "--trust-all-polymorphic". The documentation says

"Trust that all the input training sets' unfiltered records contain only polymorphic sites to drastically speed up the computation."

What I'm trying to figure out is whether this means that the sites in the training dataset are polymorphic in the training set or in the test set. For example, I have a set of data I'm using as my training dataset (not human data). I've filtered it to a set of sites that I am confident in, and would like to use as my training set. Within this set, all those sites are polymorphic.
I have a test set of data, with different individuals, which I would like to filter. In this test set, some of the sites identified in the training set will be polymorphic, but some will not be. In this case, should I set --trust-all-polymorphic to TRUE?

↧

Spark

January 18, 2018, 7:02 pm

≫ Next: Has something changed since GATK 4.0.0.0 regarding spark integration?

≪ Previous: VariantRecalibrator parameter question

In a nutshell, Spark is a piece of software that GATK4 uses to do multithreading, which is a form of parallelization that allows a computer (or cluster of computers) to finish executing a task sooner. You can read more about multithreading and parallelism in GATK here. The Spark software library is open-source and maintained by the Apache Software Foundation. It is very widely used in the computing industry and is one of the most promising technologies for accelerating execution of analysis pipelines.

Not all GATK tools use Spark

Tools that can use Spark generally have a note to that effect in their respective Tool Doc.

- Some GATK tools exist in distinct Spark-capable and non-Spark-capable versions

The "sparkified" versions have the suffix "Spark" at the end of their names. Many of these are still experimental; down the road we plan to consolidate them so that there will be only one version per tool.

- Some GATK tools only exist in a Spark-capable version

Those tools don't have the "Spark" suffix.

You don't need a Spark cluster to run Spark-enabled GATK tools!

If you're working on a "normal" machine (even just a laptop) with multiple CPU cores, the GATK engine can still use Spark to create a virtual standalone cluster in place, and set it to take advantage of however many cores are available on the machine -- or however many you choose to allocate. See the example parameters below and the local-Spark tutorial for more information on how to control this. And if your machine only has a single core, these tools can always be run in single-core mode -- it'll just take longer for them to finish.

To be clear, even the Spark-only tools can be run on regular machines, though in practice a few of them may be prohibitively slow (SV tools and PathSeq). See the Tool Docs for tool-specific recommendations.

If you do have access to a Spark cluster, the Spark-enabled tools are going to be extra happy but you may need to provide some additional parameters to use them effectively. See the cluster-Spark tutorial for more information.

Example command-line parameters

Here are some example arguments you would give to a Spark-enabled GATK tool:

--sparkMaster local[*] -> "Run on the local machine using all cores"
--sparkMaster local[2] -> "Run on the local machine using two cores"
--sparkMaster spark://23.195.26.187:7077 -> "Run on the cluster at 23.195.26.187, port 7077"
--sparkRunner GCS --cluster my_cluster -> "Run on my_cluster in Google Dataproc"

You don't need to install any additional software to use Spark in GATK

All the necessary software for using Spark, whether it's on a local machine or a Spark cluster, is bundled within the GATK itself. Just make sure to invoke GATK using the gatk wrapper script rather than calling the jar directly, because the wrapper will select the appropriate jar file (there are two!) and will set some parameters for you.

↧

Has something changed since GATK 4.0.0.0 regarding spark integration?

May 30, 2018, 8:01 pm

≫ Next: Why is it so slow to download the GATK resource bundle through Broad ftp?

≪ Previous: Spark

I'm not sure if something has changed in GATK-4.0.0.0 and the newer versions:

1) spark is no longer included in the GATK package -- it required the "spark-submit" command in $PATH.

2) the argument names have changed : now it is "--spark-runner SPARK --spark-master $SPARK_URL ".

3) the "spark-submit" command failed because the string (the "-D" options to java) for the "extraJavaOptions" argument was not quoted. I wrote a wrapper to add quotes around these "-D" options.

4) the "--spark-master" argument was given to the command (for example, SortSam) submitted to spark, causing an error. I removed it in the wrapper.

I'm not sure if I caused all these problems myself, or if it is specific to my platform (CentOS 6.4 and JDK 1.8.0).

↧

Why is it so slow to download the GATK resource bundle through Broad ftp?

March 18, 2018, 6:34 pm

≫ Next: GATK Resource Bundle FTP broken?

≪ Previous: Has something changed since GATK 4.0.0.0 regarding spark integration?

I tried to download from ftp://ftp.broadinstitute.org/bundle/b37/ but it is very slow. Is it normal?

↧

GATK Resource Bundle FTP broken?

May 30, 2018, 11:50 pm

≫ Next: How to relax parameters in GenotypeGVCFs to get more variants?

≪ Previous: Why is it so slow to download the GATK resource bundle through Broad ftp?

The FTP site for downloading hg19 resources is broken when I'm writing this message. Is there any other place that mirrors the FTP resources?

↧

How to relax parameters in GenotypeGVCFs to get more variants?

May 31, 2018, 2:19 am

≫ Next: Can someone help me run any GATK4 pipeline?

≪ Previous: GATK Resource Bundle FTP broken?

Hi,

I am using GenotypeGVCFs in following command,

java -Xmx2g -jar CancerAnalysisPackage-2015.1-3/GenomeAnalysisTK.jar -T GenotypeGVCFs -R Refgenome_ucsc/ucsc.hg19.fasta --variant chrn.g.vcf --out chrn
.rawVariants.vcf -L chrn.bed --interval_padding 100 --disable_auto_index_creation_and_locking_when_reading_rods -nt 4

no of variants in chrn.g.vcf = 12248378
no of variant in chrn.rawVariants.vcf = 85579

Is there any way to get more variants as i miss lot of vairants during this step? Please help me out as i am in great need.

↧

Can someone help me run any GATK4 pipeline?

May 31, 2018, 8:07 am

≫ Next: Estimate tumor contamination in normal sample

≪ Previous: How to relax parameters in GenotypeGVCFs to get more variants?

GATK4 is a great variant calling software product and is dominating in the field. Unfortunately, I am not able to run any of the GATK4 pipelines, probably because these pipelines all use google storage which I am not familiar with? Now I am trying to run the $5 variant calling pipeline because I guess it's the easiest to run.

I downloaded the pipeline using:

git clone https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline.git

The command line used was:

java -jar cromwell-31.jar run germline_single_sample_workflow.wdl --inputs germline_single_sample_workflow.hg38.inputs.json

Both the .wdl file and the .json file are included in the GitHub package and unchanged when the above command line was run.

The (error) messages I got from the above command line execution are attached at the bottom of this post.

Can someone please tell me what I need to do to make this work?

Thanks much!

--------------------------------- (error) messages snippets -------------------------------------

[2018-05-31 10:08:28,06] [info] Running with database db.url = jdbc:hsqldb:mem:9f3b961e-97d8-4fc4-a30a-7e86c6f14bdc;shutdown=false;hsqldb.tx=mvcc
[2018-05-31 10:08:32,43] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-05-31 10:08:32,44] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-05-31 10:08:32,52] [info] Running with database db.url = jdbc:hsqldb:mem:11019e42-b5ee-4466-bbad-80dbf98f3c00;shutdown=false;hsqldb.tx=mvcc
[2018-05-31 10:08:32,81] [info] Slf4jLogger started
[2018-05-31 10:08:32,98] [info] Metadata summary refreshing every 2 seconds.
[2018-05-31 10:08:33,01] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.

...

to_bam_workflow.CheckFingerprint -> Local, to_bam_workflow.SamToFastqAndBwaMemAndMba -> Local, germline_single_sample_workflow.ScatterIntervalList -> Local, germline_single_sample_workflow.ValidateGVCF -> Local, germline_single_sample_workflow.CheckPreValidation -> Local, to_bam_workflow.CheckContamination -> Local, split_large_readgroup.SumSplitAlignedSizes -> Local, germline_single_sample_workflow.CollectGvcfCallingMetrics -> Local, to_bam_workflow.CollectUnsortedReadgroupBamQualityMetrics -> Local, to_bam_workflow.CollectQualityYieldMetrics -> Local, to_bam_workflow.SortSampleBam -> Local,

...

[2018-05-31 10:08:41,71] [[38;5;220mwarn[0m] Local [[38;5;2m6b73056d[0m]: Key/s [memory, disks, preemptible] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-05-31 10:08:41,71] [[38;5;220mwarn[0m] Local [[38;5;2m6b73056d[0m]: Key/s [preemptible, disks, cpu, memory] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-05-31 10:08:43,92] [info] WorkflowExecutionActor-6b73056d-8171-4712-a05a-b8dfcdeb36d6 [[38;5;2m6b73056d[0m]: Starting germline_single_sample_workflow.ScatterIntervalList
[2018-05-31 10:08:44,97] [info] fe6db5b3-d91a-40d4-a35b-cf3c937deaaa-SubWorkflowActor-SubWorkflow-to_bam_workflow :-1: 1 [[38;5;2mfe6db5b3[0m]: Starting to_bam_workflow.GetBwaVersion
[2018-05-31 10:08:45,95] [[38;5;220mwarn[0m] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.GetBwaVersion:NA:1]: Unrecognized runtime attribute keys: memory
[2018-05-31 10:08:45,95] [[38;5;220mwarn[0m] BackgroundConfigAsyncJobExecutionActor [[38;5;2m6b73056d[0mgermline_single_sample_workflow.ScatterIntervalList:NA:1]: Unrecognized runtime attribute keys: memory
[2018-05-31 10:08:45,98] [[38;5;1merror[0m] BackgroundConfigAsyncJobExecutionActor [[38;5;2m6b73056d[0mgermline_single_sample_workflow.ScatterIntervalList:NA:1]: Error attempting to Execute
java.lang.Exception: Failed command instantiation
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:400)
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand$(StandardAsyncExecutionActor.scala:340)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.instantiatedCommand$lzycompute(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.instantiatedCommand(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents(StandardAsyncExecutionActor.scala:235)
at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents$(StandardAsyncExecutionActor.scala:234)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.commandScriptContents(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.writeScriptContents(SharedFileSystemAsyncJobExecutionActor.scala:140)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.writeScriptContents$(SharedFileSystemAsyncJobExecutionActor.scala:139)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.cromwell$backend$sfs$BackgroundAsyncJobExecutionActor$$super$writeScriptContents(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.sfs.BackgroundAsyncJobExecutionActor.writeScriptContents(BackgroundAsyncJobExecutionActor.scala:12)
at cromwell.backend.sfs.BackgroundAsyncJobExecutionActor.writeScriptContents$(BackgroundAsyncJobExecutionActor.scala:11)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.writeScriptContents(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:123)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:121)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:451)
at scala.util.Try$.apply(Try.scala:209)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:451)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:451)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:744)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:736)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:514)
at akka.actor.Actor.aroundReceive$(Actor.scala:512)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:191)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527)
at akka.actor.ActorCell.invoke(ActorCell.scala:496)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: common.exception.AggregatedMessageException: Error(s):
:
java.lang.IllegalArgumentException: gs://broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
gs://broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:60)
at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:56)
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:398)
... 42 common frames omitted
[2018-05-31 10:08:45,99] [info] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.GetBwaVersion:NA:1]: [38;5;5m# not setting set -o pipefail here because /bwa has a rc=1 and we dont want to allow rc=1 to succeed because

the sed may also fail with that error and that is something we actually want to fail on.

/usr/gitc/bwa 2>&1 | \
grep -e '^Version' | \
sed 's/Version: //'[0m
[2018-05-31 10:08:46,02] [info] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.GetBwaVersion:NA:1]: executing: docker run \
--cidfile /Users/moushengxu/softspace/mudroom/gatk/five-dollar-genome-analysis-pipeline/cromwell-executions/germline_single_sample_workflow/6b73056d-8171-4712-a05a-b8dfcdeb36d6/call-to_bam_workflow/to_bam_workflow/fe6db5b3-d91a-40d4-a35b-cf3c937deaaa/call-GetBwaVersion/execution/docker_cid \
--rm -i \
\
--entrypoint /bin/bash \
-v /Users/moushengxu/softspace/mudroom/gatk/five-dollar-genome-analysis-pipeline/cromwell-executions/germline_single_sample_workflow/6b73056d-8171-4712-a05a-b8dfcdeb36d6/call-to_bam_workflow/to_bam_workflow/fe6db5b3-d91a-40d4-a35b-cf3c937deaaa/call-GetBwaVersion:/cromwell-executions/germline_single_sample_workflow/6b73056d-8171-4712-a05a-b8dfcdeb36d6/call-to_bam_workflow/to_bam_workflow/fe6db5b3-d91a-40d4-a35b-cf3c937deaaa/call-GetBwaVersion \
us.gcr.io/broad-gotc-prod/genomes-in-the-cloud@sha256:7bc64948a0a9f50ea55edb8b30c710943e44bd861c46a229feaf121d345e68ed /cromwell-executions/germline_single_sample_workflow/6b73056d-8171-4712-a05a-b8dfcdeb36d6/call-to_bam_workflow/to_bam_workflow/fe6db5b3-d91a-40d4-a35b-cf3c937deaaa/call-GetBwaVersion/execution/script
[2018-05-31 10:08:46,10] [info] fe6db5b3-d91a-40d4-a35b-cf3c937deaaa-SubWorkflowActor-SubWorkflow-to_bam_workflow :-1: 1 [[38;5;2mfe6db5b3[0m]: Starting to_bam_workflow.CreateSequenceGroupingTSV
[2018-05-31 10:08:47,08] [[38;5;220mwarn[0m] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.CreateSequenceGroupingTSV:NA:1]: Unrecognized runtime attribute keys: preemptible, memory
[2018-05-31 10:08:47,08] [[38;5;1merror[0m] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.CreateSequenceGroupingTSV:NA:1]: Error attempting to Execute
java.lang.Exception: Failed command instantiation
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:400)
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand$(StandardAsyncExecutionActor.sca

...

at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:451)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:451)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:744)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:736)

...

    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: common.exception.AggregatedMessageException: Error(s):
:
java.lang.IllegalArgumentException: gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:60)
at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:56)

...

at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
at cromwell.core.retry.Retry$$anonfun$withRetry$1.$anonfun$applyOrElse$3(Retry.scala:44)
at akka.pattern.FutureTimeoutSupport.liftedTree1$1(FutureTimeoutSupport.scala:25)
at akka.pattern.FutureTimeoutSupport.$anonfun$after$1(FutureTimeoutSupport.scala:25)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:140)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: common.exception.AggregatedMessageException: Error(s):
:
java.lang.IllegalArgumentException: gs://broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
gs://broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:60)
at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:56)
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:398)
... 35 common frames omitted
[2018-05-31 10:08:48,04] [info] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.GetBwaVersion:NA:1]: job id: 67817
[2018-05-31 10:08:48,05] [info] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.GetBwaVersion:NA:1]: Status change from - to WaitingForReturnCodeFile
[2018-05-31 10:08:49,65] [[38;5;1merror[0m] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.CreateSequenceGroupingTSV:NA:1]: Error attempting to Execute
java.lang.Exception: Failed command instantiation
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:400)

...

cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:191)
at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
at cromwell.core.retry.Retry$$anonfun$withRetry$1.$anonfun$applyOrElse$3(Retry.scala:44)
at akka.pattern.FutureTimeoutSupport.liftedTree1$1(FutureTimeoutSupport.scala:25)
at akka.pattern.FutureTimeoutSupport.$anonfun$after$1(FutureTimeoutSupport.scala:25)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:140)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: common.exception.AggregatedMessageException: Error(s):
:
java.lang.IllegalArgumentException: gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:60)
at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:56)
...
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: common.exception.AggregatedMessageException: Error(s):
:
java.lang.IllegalArgumentException: gs://broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
gs://broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:60)
at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:56)

...

at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

...

java.lang.IllegalArgumentException: gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict exists on a filesystem not supported by this instance of Cromwell. Supported filesystems are: MacOSXFileSystem. Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:60)
at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:56)
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:398)
... 35 common frames omitted
[2018-05-31 10:12:05,44] [info] Automatic shutdown of the async connection
[2018-05-31 10:12:05,44] [info] Gracefully shutdown sentry threads.
[2018-05-31 10:12:05,44] [info] Starting coordinated shutdown from JVM shutdown hook
[2018-05-31 10:12:05,44] [info] Shutdown finished.
[2018-05-31 10:12:05,44] [info] Workflow polling stopped
[2018-05-31 10:12:05,45] [info] Shutting down WorkflowStoreActor - Timeout = 5000 milliseconds
[2018-05-31 10:12:05,45] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5000 milliseconds
[2018-05-31 10:12:05,45] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5000 milliseconds
[2018-05-31 10:12:05,45] [info] Aborting all running workflows.
[2018-05-31 10:12:05,45] [info] JobExecutionTokenDispenser stopped
[2018-05-31 10:12:05,45] [info] WorkflowStoreActor stopped
[2018-05-31 10:12:05,45] [info] Message [cromwell.engine.workflow.workflowstore.WorkflowStoreActor$WorkDone$] from Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowStoreActor/WorkflowStoreEngineActor#-1393405839] to Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowStoreActor/WorkflowStoreEngineActor#-1393405839] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[2018-05-31 10:12:05,45] [info] WorkflowLogCopyRouter stopped
[2018-05-31 10:12:05,45] [info] Shutting down WorkflowManagerActor - Timeout = 3600000 milliseconds
[2018-05-31 10:12:05,45] [info] WorkflowManagerActor Aborting all workflows
[2018-05-31 10:12:05,45] [info] WorkflowExecutionActor-6b73056d-8171-4712-a05a-b8dfcdeb36d6 [[38;5;2m6b73056d[0m]: Aborting workflow
[2018-05-31 10:12:05,46] [info] fe6db5b3-d91a-40d4-a35b-cf3c937deaaa-SubWorkflowActor-SubWorkflow-to_bam_workflow :-1: 1 [[38;5;2mfe6db5b3[0m]: Aborting workflow
[2018-05-31 10:12:05,46] [info] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.GetBwaVersion:NA:1]: executing: docker kill [38;5;5mcat /Users/moushengxu/softspace/mudroom/gatk/five-dollar-genome-analysis-pipeline/cromwell-executions/germline_single_sample_workflow/6b73056d-8171-4712-a05a-b8dfcdeb36d6/call-to_bam_workflow/to_bam_workflow/fe6db5b3-d91a-40d4-a35b-cf3c937deaaa/call-GetBwaVersion/execution/docker_cid[0m
[2018-05-31 10:12:05,53] [info] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.GetBwaVersion:NA:1]: BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0m:to_bam_workflow.GetBwaVersion:NA:1] Aborted StandardAsyncJob(67817)
[2018-05-31 10:12:46,24] [info] BackgroundConfigAsyncJobExecutionActor [[38;5;2mfe6db5b3[0mto_bam_workflow.GetBwaVersion:NA:1]: Status change from WaitingForReturnCodeFile to Done
[2018-05-31 10:12:46,24] [info] Message [cromwell.engine.workflow.tokens.JobExecutionTokenDispenserActor$JobExecutionTokenReturn$] from Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/WorkflowActor-6b73056d-8171-4712-a05a-b8dfcdeb36d6/WorkflowExecutionActor-6b73056d-8171-4712-a05a-b8dfcdeb36d6/SubWorkflowExecutionActor-SubWorkflow-to_bam_workflow:-1:1/fe6db5b3-d91a-40d4-a35b-cf3c937deaaa-SubWorkflowActor-SubWorkflow-to_bam_workflow:-1:1/fe6db5b3-d91a-40d4-a35b-cf3c937deaaa-EngineJobExecutionActor-to_bam_workflow.GetBwaVersion:NA:1#-366400105] to Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/JobExecutionTokenDispenser#-561364293] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[2018-05-31 10:12:46,24] [info] fe6db5b3-d91a-40d4-a35b-cf3c937deaaa-SubWorkflowActor-SubWorkflow-to_bam_workflow :-1: 1 [[38;5;2mfe6db5b3[0m]: WorkflowExecutionActor [[38;5;2mfe6db5b3[0m] aborted: to_bam_workflow.GetBwaVersion:NA:1
[2018-05-31 10:12:46,84] [info] WorkflowExecutionActor-6b73056d-8171-4712-a05a-b8dfcdeb36d6 [[38;5;2m6b73056d[0m]: WorkflowExecutionActor [[38;5;2m6b73056d[0m] aborted: SubWorkflow-to_bam_workflow :-1: 1
[2018-05-31 10:12:47,72] [info] WorkflowManagerActor All workflows are aborted
[2018-05-31 10:12:47,72] [info] WorkflowManagerActor All workflows finished
[2018-05-31 10:12:47,72] [info] WorkflowManagerActor stopped
[2018-05-31 10:12:47,72] [info] Connection pools shut down
[2018-05-31 10:12:47,72] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800000 milliseconds
[2018-05-31 10:12:47,72] [info] Shutting down JobStoreActor - Timeout = 1800000 milliseconds
[2018-05-31 10:12:47,72] [info] Shutting down CallCacheWriteActor - Timeout = 1800000 milliseconds
[2018-05-31 10:12:47,72] [info] SubWorkflowStoreActor stopped
[2018-05-31 10:12:47,72] [info] Shutting down ServiceRegistryActor - Timeout = 1800000 milliseconds
[2018-05-31 10:12:47,72] [info] Shutting down DockerHashActor - Timeout = 1800000 milliseconds
[2018-05-31 10:12:47,72] [info] Shutting down IoProxy - Timeout = 1800000 milliseconds
[2018-05-31 10:12:47,72] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2018-05-31 10:12:47,72] [info] CallCacheWriteActor stopped
[2018-05-31 10:12:47,72] [info] JobStoreActor stopped
[2018-05-31 10:12:47,72] [info] KvWriteActor Shutting down: 0 queued messages to process
[2018-05-31 10:12:47,72] [info] DockerHashActor stopped
[2018-05-31 10:12:47,72] [info] WriteMetadataActor Shutting down: 37 queued messages to process
[2018-05-31 10:12:47,72] [info] IoProxy stopped
[2018-05-31 10:12:47,73] [info] WriteMetadataActor Shutting down: processing 0 queued messages
[2018-05-31 10:12:47,73] [info] ServiceRegistryActor stopped
[2018-05-31 10:12:47,74] [info] Database closed
[2018-05-31 10:12:47,74] [info] Stream materializer shut down
[2018-05-31 10:12:47,74] [info] Message [cromwell.core.actor.StreamActorHelper$StreamFailed] without sender to Actor[akka://cromwell-system/deadLetters] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

↧