GATK Determine Contig Ploidy Error: ValueError: invalid literal for int() with base 10: '0 PLOIDY_P

June 26, 2019, 1:34 pm

≫ Next: Warnings in CreateSomaticPanelOfNormals after GenomicsDBImport

≪ Previous: How to evaluate a call set when no true snp set is available?

Hi,

I am running GATK4 Germline CNV pipeline on 203 WES samples, and have gotten the following error during the Determine Contig Ploidy step:

15:29:38.431 DEBUG ScriptExecutor - Executing: 15:29:38.431 DEBUG ScriptExecutor - python 15:29:38.431 DEBUG ScriptExecutor - /tmp/cohort_determine_ploidy_and_depth.5886214504352644800.py 15:29:38.431 DEBUG ScriptExecutor - --sample_coverage_metadata=/tmp/samples-by-coverage-per-contig8026198779202597096.tsv 15:29:38.431 DEBUG ScriptExecutor - --output_calls_path=/mnt/data/smb_share/Mandal_project/PCa_WES/mcdowell#Bailey-Wilson_NF_AAPC/SampleBams/ploidy-calls 15:29:38.431 DEBUG ScriptExecutor - --mapping_error_rate=1.000000e-02 15:29:38.431 DEBUG ScriptExecutor - --psi_s_scale=1.000000e-04 15:29:38.431 DEBUG ScriptExecutor - --mean_bias_sd=1.000000e-02 15:29:38.431 DEBUG ScriptExecutor - --psi_j_scale=1.000000e-03 15:29:38.432 DEBUG ScriptExecutor - --learning_rate=5.000000e-02 15:29:38.432 DEBUG ScriptExecutor - --adamax_beta1=9.000000e-01 15:29:38.432 DEBUG ScriptExecutor - --adamax_beta2=9.990000e-01 15:29:38.432 DEBUG ScriptExecutor - --log_emission_samples_per_round=2000 15:29:38.432 DEBUG ScriptExecutor - --log_emission_sampling_rounds=100 15:29:38.432 DEBUG ScriptExecutor - --log_emission_sampling_median_rel_error=5.000000e-04 15:29:38.432 DEBUG ScriptExecutor - --max_advi_iter_first_epoch=1000 15:29:38.432 DEBUG ScriptExecutor - --max_advi_iter_subsequent_epochs=1000 15:29:38.432 DEBUG ScriptExecutor - --min_training_epochs=20 15:29:38.432 DEBUG ScriptExecutor - --max_training_epochs=100 15:29:38.432 DEBUG ScriptExecutor - --initial_temperature=2.000000e+00 15:29:38.432 DEBUG ScriptExecutor - --num_thermal_advi_iters=5000 15:29:38.432 DEBUG ScriptExecutor - --convergence_snr_averaging_window=5000 15:29:38.432 DEBUG ScriptExecutor - --convergence_snr_trigger_threshold=1.000000e-01 15:29:38.432 DEBUG ScriptExecutor - --convergence_snr_countdown_window=10 15:29:38.432 DEBUG ScriptExecutor - --max_calling_iters=1 15:29:38.432 DEBUG ScriptExecutor - --caller_update_convergence_threshold=1.000000e-03 15:29:38.432 DEBUG ScriptExecutor - --caller_internal_admixing_rate=7.500000e-01 15:29:38.432 DEBUG ScriptExecutor - --caller_external_admixing_rate=7.500000e-01 15:29:38.432 DEBUG ScriptExecutor - --disable_caller=false 15:29:38.432 DEBUG ScriptExecutor - --disable_sampler=false 15:29:38.432 DEBUG ScriptExecutor - --disable_annealing=false 15:29:38.432 DEBUG ScriptExecutor - --interval_list=/tmp/intervals8489974735940571592.tsv 15:29:38.432 DEBUG ScriptExecutor - --contig_ploidy_prior_table=/mnt/data/smb_share/Mandal_project/PCa_WES/mcdowell#Bailey-Wilson_NF_AAPC/SampleBams/contigPloidyPriorsTable4.tsv 15:29:38.432 DEBUG ScriptExecutor - --output_model_path=/mnt/data/smb_share/Mandal_project/PCa_WES/mcdowell#Bailey-Wilson_NF_AAPC/SampleBams/ploidy-model Traceback (most recent call last): File "/tmp/cohort_determine_ploidy_and_depth.5886214504352644800.py", line 79, in <module> args.contig_ploidy_prior_table) File "/usr/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_ploidy.py", line 190, in get_contig_ploidy_prior_map_from_tsv_file ploidy_values = [int(column[len(io_consts.ploidy_prior_prefix):]) for column in columns[1:]] File "/usr/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_ploidy.py", line 190, in <listcomp> ploidy_values = [int(column[len(io_consts.ploidy_prior_prefix):]) for column in columns[1:]] ValueError: invalid literal for int() with base 10: '0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3' 15:29:52.389 DEBUG ScriptExecutor - Result: 1 15:29:52.390 INFO DetermineGermlineContigPloidy - Shutting down engine [June 26, 2019 3:29:52 PM CDT] org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy done. Elapsed time: 5.24 minutes. Runtime.totalMemory()=6116343808 org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: python exited with 1 Command Line: python /tmp/cohort_determine_ploidy_and_depth.5886214504352644800.py --sample_coverage_metadata=/tmp/samples-by-coverage-per-contig8026198779202597096.tsv --output_calls_path=/mnt/data/smb_share/Mandal_project/PCa_WES/mcdowell#Bailey-Wilson_NF_AAPC/SampleBams/ploidy-calls --mapping_error_rate=1.000000e-02 --psi_s_scale=1.000000e-04 --mean_bias_sd=1.000000e-02 --psi_j_scale=1.000000e-03 --learning_rate=5.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.990000e-01 --log_emission_samples_per_round=2000 --log_emission_sampling_rounds=100 --log_emission_sampling_median_rel_error=5.000000e-04 --max_advi_iter_first_epoch=1000 --max_advi_iter_subsequent_epochs=1000 --min_training_epochs=20 --max_training_epochs=100 --initial_temperature=2.000000e+00 --num_thermal_advi_iters=5000 --convergence_snr_averaging_window=5000 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=1 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=7.500000e-01 --disable_caller=false --disable_sampler=false --disable_annealing=false --interval_list=/tmp/intervals8489974735940571592.tsv --contig_ploidy_prior_table=/mnt/data/smb_share/Mandal_project/PCa_WES/mcdowell#Bailey-Wilson_NF_AAPC/SampleBams/contigPloidyPriorsTable4.tsv --output_model_path=/mnt/data/smb_share/Mandal_project/PCa_WES/mcdowell#Bailey-Wilson_NF_AAPC/SampleBams/ploidy-model at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75) at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126) at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170) at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151) at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121) at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.executeDeterminePloidyAndDepthPythonScript(DetermineGermlineContigPloidy.java:411) at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.doWork(DetermineGermlineContigPloidy.java:288) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205) at org.broadinstitute.hellbender.Main.main(Main.java:291)

We are using this contig ploidy table below:

CONTIG_NAME     PLOIDY_PRIOR_0  PLOIDY_PRIOR_1  PLOIDY_PRIOR_2  PLOIDY_PRIOR_3
chr1      0.01    0.02    0.95    0.02
chr2      0.01    0.02    0.95    0.02
chr3      0.01    0.02    0.95    0.02
chr4      0.01    0.02    0.95    0.02
chr5      0.01    0.02    0.95    0.02
chr6      0.01    0.02    0.95    0.02
chr7      0.01    0.02    0.95    0.02
chr8      0.01    0.02    0.95    0.02
chr9      0.01    0.02    0.95    0.02
chr10    0.01    0.02    0.95    0.02
chr11    0.01    0.02    0.95    0.02
chr12    0.01    0.02    0.95    0.02
chr13    0.01    0.02    0.95    0.02
chr14    0.01    0.02    0.95    0.02
chr15    0.01    0.02    0.95    0.02
chr16    0.01    0.02    0.95    0.02
chr17    0.01    0.02    0.95    0.02
chr18    0.01    0.02    0.95    0.02
chr19    0.01    0.02    0.95    0.02
chr20    0.01    0.02    0.95    0.02
chr21    0.01    0.02    0.95    0.02
chr22    0.01    0.02    0.95    0.02
chrX      0.01    0.49    0.48    0.02
chrY      0.49    0.49    0.02    0

Any help would be appreciated.

Thanks,
Tarun

↧

Warnings in CreateSomaticPanelOfNormals after GenomicsDBImport

May 7, 2019, 11:03 pm

≫ Next: How to determine or calculate the read structure parameter in module IlluminaBasecallsToSam ?

≪ Previous: GATK Determine Contig Ploidy Error: ValueError: invalid literal for int() with base 10: '0 PLOIDY_P

gatk 4.1.1.0, bash pipeline, linux server, WES data

Hi,

just a question considering that they are just warnings:

after GDBI tool I run the CreateSomaticPanelOfNormals tool and I have these warning messages:

07:22:15.863 INFO CreateSomaticPanelOfNormals - Initializing engine
07:22:16.381 INFO FeatureManager - Using codec VCFCodec to read file file:///shared/resources/gatk4hg38db/af-only-gnomad.hg38.vcf.gz
WARNING: No valid combination operation found for INFO field CONTQ - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field ECNT - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field GERMQ - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MBQ - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MFRL - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MMQ - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MPOS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field NALOD - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field NCount - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field NLOD - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field OCM - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field PON - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field POPAF - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field ROQ - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field RPA - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field RU - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field SEQQ - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field STR - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field STRANDQ - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field STRQ - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field TLOD - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field UNIQ_ALT_READ_COUNT - the field will NOT be part of INFO fields in the generated VCF records
07:22:19.336 INFO CreateSomaticPanelOfNormals - Done initializing engine
07:22:19.479 INFO ProgressMeter - Starting traversal
07:22:19.479 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
... same warnings ...
07:23:51.307 INFO ProgressMeter - chr1:1048793 1.5 1000 653.4
07:24:01.334 INFO ProgressMeter - chr1:26271182 1.7 16000 9425.2
07:24:11.697 INFO ProgressMeter - chr1:75234146 1.9 28000 14971.0
07:24:22.380 INFO ProgressMeter - chr1:154237014 2.0 41000 20016.1
07:24:32.390 INFO ProgressMeter - chr1:201209877 2.2 52000 23474.5
07:24:42.420 INFO ProgressMeter - chr1:248061119 2.4 63000 26444.5
... same warnings ...
07:25:30.031 INFO ProgressMeter - chr10:382506 3.2 64000 20152.1
07:25:40.548 INFO ProgressMeter - chr10:50050649 3.4 73000 21783.6
07:25:50.689 INFO ProgressMeter - chr10:103584776 3.5 84000 23862.5
... same warnings ...
....
....

Is it something normal (I guess that missing some data and for this I have these warnings)? Are them related to a specific genome position/variant or is something more general?

Somehow can I fix them? or those data are not important for downstream analysis (Tumor sample analysis with Mutect2+PON)?

single sample pon

${GATK4} --java-options "${javaOpt}" Mutect2 \
-R ${hg38} \
-I "${bqsr}.bam" \
-O "${sample-PON}.vcf.gz" \
-L ${wes_intervals} \
-ip ${pad} \
--max-mnp-distance "0"

cohort pon

${GATK4} --java-options "${javaOpt}" GenomicsDBImport \
-R ${hg38} \
-V ... -V ... \
-L ${chromosome_list} \
-ip ${pad} \
--batch-size 1 \
--reader-threads 1 \
--genomicsdb-workspace-path "GDBI"

${GATK4} --java-options "${javaOpt}" CreateSomaticPanelOfNormals \
-R ${hg38}
-V gendb://"GDBI" \
-O "${cohort-PON}.vcf.gz" \
--germline-resource ${AFGNOMAD}

Thanks

↧

How to determine or calculate the read structure parameter in module IlluminaBasecallsToSam ?

June 27, 2019, 4:37 am

≫ Next: IllegalArgumentException: samples cannot be empty

≪ Previous: Warnings in CreateSomaticPanelOfNormals after GenomicsDBImport

I was trying to convert basecall files into unmapped BAM files using the picard module IlluminaBasecallsToSam where I came across the parameter Read Structure. So I am wondering what value should I pass to this parameter. How to find out read structure for my basecall files?

↧

IllegalArgumentException: samples cannot be empty

July 24, 2017, 10:53 am

≫ Next: Converting ModelSegments outputs for ABSOLUTE

≪ Previous: How to determine or calculate the read structure parameter in module IlluminaBasecallsToSam ?

I am trying to run HaplotypeCaller on some data that I know is messy and would fail some of the filters, so I have run it both with and without --disableToolDefaultReadFilters. Either way I don't get any output file, but I do get a message "samples cannot be empty", Does this mean that my data is still failing some built-in control, or am I doing something else wrong? I have checked the SQ, and when I run CountReads (with --disableToolDefaultReadFilters) it results in "Tool returned: 24634".

Here's my command:

$ java -jar ~/Downloads/gatk-4.beta.2/gatk-package-4.beta.2-local.jar HaplotypeCaller -R DQA_contig.fasta -ploidy 50 -I IRL-A.bam.sorted.bam -O IRL-A.vcf --disableToolDefaultReadFilters
13:40:49.007 WARN IntelGKLUtils - Error starting process to check for AVX support : grep -i avx /proc/cpuinfo
13:40:49.014 WARN IntelGKLUtils - Error starting process to check for AVX support : grep -i avx /proc/cpuinfo
[July 24, 2017 1:40:48 PM EDT] HaplotypeCaller --sample_ploidy 50 --output IRL-A.vcf --input IRL-A.bam.sorted.bam --reference DQA_contig.fasta --disableToolDefaultReadFilters true --group StandardAnnotation --group StandardHCAnnotation --GVCFGQBands 1 --GVCFGQBands 2 --GVCFGQBands 3 --GVCFGQBands 4 --GVCFGQBands 5 --GVCFGQBands 6 --GVCFGQBands 7 --GVCFGQBands 8 --GVCFGQBands 9 --GVCFGQBands 10 --GVCFGQBands 11 --GVCFGQBands 12 --GVCFGQBands 13 --GVCFGQBands 14 --GVCFGQBands 15 --GVCFGQBands 16 --GVCFGQBands 17 --GVCFGQBands 18 --GVCFGQBands 19 --GVCFGQBands 20 --GVCFGQBands 21 --GVCFGQBands 22 --GVCFGQBands 23 --GVCFGQBands 24 --GVCFGQBands 25 --GVCFGQBands 26 --GVCFGQBands 27 --GVCFGQBands 28 --GVCFGQBands 29 --GVCFGQBands 30 --GVCFGQBands 31 --GVCFGQBands 32 --GVCFGQBands 33 --GVCFGQBands 34 --GVCFGQBands 35 --GVCFGQBands 36 --GVCFGQBands 37 --GVCFGQBands 38 --GVCFGQBands 39 --GVCFGQBands 40 --GVCFGQBands 41 --GVCFGQBands 42 --GVCFGQBands 43 --GVCFGQBands 44 --GVCFGQBands 45 --GVCFGQBands 46 --GVCFGQBands 47 --GVCFGQBands 48 --GVCFGQBands 49 --GVCFGQBands 50 --GVCFGQBands 51 --GVCFGQBands 52 --GVCFGQBands 53 --GVCFGQBands 54 --GVCFGQBands 55 --GVCFGQBands 56 --GVCFGQBands 57 --GVCFGQBands 58 --GVCFGQBands 59 --GVCFGQBands 60 --GVCFGQBands 70 --GVCFGQBands 80 --GVCFGQBands 90 --GVCFGQBands 99 --indelSizeToEliminateInRefModel 10 --useAllelesTrigger false --dontTrimActiveRegions false --maxDiscARExtension 25 --maxGGAARExtension 300 --paddingAroundIndels 150 --paddingAroundSNPs 20 --kmerSize 10 --kmerSize 25 --dontIncreaseKmerSizesForCycles false --allowNonUniqueKmersInRef false --numPruningSamples 1 --recoverDanglingHeads false --doNotRecoverDanglingBranches false --minDanglingBranchLength 4 --consensus false --maxNumHaplotypesInPopulation 128 --errorCorrectKmers false --minPruning 2 --debugGraphTransformations false --kmerLengthForReadErrorCorrection 25 --minObservationsForKmerToBeSolid 20 --likelihoodCalculationEngine PairHMM --base_quality_score_threshold 18 --gcpHMM 10 --pair_hmm_implementation FASTEST_AVAILABLE --pcr_indel_model CONSERVATIVE --phredScaledGlobalReadMismappingRate 45 --nativePairHmmThreads 4 --useDoublePrecision false --debug false --useFilteredReadsForAnnotations false --emitRefConfidence NONE --bamWriterType CALLED_HAPLOTYPES --disableOptimizations false --justDetermineActiveRegions false --dontGenotype false --dontUseSoftClippedBases false --captureAssemblyFailureBAM false --errorCorrectReads false --doNotRunPhysicalPhasing false --min_base_quality_score 10 --useNewAFCalculator false --annotateNDA false --heterozygosity 0.001 --indel_heterozygosity 1.25E-4 --heterozygosity_stdev 0.01 --standard_min_confidence_threshold_for_calling 10.0 --max_alternate_alleles 6 --max_genotype_count 1024 --genotyping_mode DISCOVERY --contamination_fraction_to_filter 0.0 --output_mode EMIT_VARIANTS_ONLY --allSitePLs false --readShardSize 5000 --readShardPadding 100 --minAssemblyRegionSize 50 --maxAssemblyRegionSize 300 --assemblyRegionPadding 100 --maxReadsPerAlignmentStart 50 --activeProbabilityThreshold 0.002 --maxProbPropagationDistance 50 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --minimumMappingQuality 20
[July 24, 2017 1:40:48 PM EDT] Executing as heidi@heidi-HP-Pavilion-dv6-Notebook-PC on Linux 4.10.0-27-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_141-b15; Version: 4.beta.2
13:40:49.017 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 5
13:40:49.017 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:40:49.017 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
13:40:49.017 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:40:49.017 INFO HaplotypeCaller - Deflater: JdkDeflater
13:40:49.017 INFO HaplotypeCaller - Inflater: JdkInflater
13:40:49.017 INFO HaplotypeCaller - Initializing engine
13:40:49.254 WARN IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
13:40:49.260 WARN IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
13:40:49.896 INFO HaplotypeCaller - Done initializing engine
13:40:49.902 INFO HaplotypeCallerEngine - Currently, physical phasing is only available for diploid samples.
13:40:50.226 WARN PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
13:40:50.503 WARN PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
13:40:50.925 INFO HaplotypeCaller - Shutting down engine
[July 24, 2017 1:40:50 PM EDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=218628096
java.lang.IllegalArgumentException: samples cannot be empty
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:681)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.ReferenceConfidenceModel.(ReferenceConfidenceModel.java:103)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.initialize(HaplotypeCallerEngine.java:165)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.(HaplotypeCallerEngine.java:146)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.onTraversalStart(HaplotypeCaller.java:200)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:836)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
at org.broadinstitute.hellbender.Main.main(Main.java:230)

↧

Converting ModelSegments outputs for ABSOLUTE

June 27, 2019, 9:15 am

≫ Next: Using JEXL to apply hard filters or select variants based on annotation values

≪ Previous: IllegalArgumentException: samples cannot be empty

Hello,

I work on exome data to detect somatic variants. I identified the SNPs/Indels with Mutect2 and I followed the ModelSegments tutorial to detect CNAs. Now I want to convert the AF outputs I got with ModelSegments into AllelicCapSeg outputs to use them with ABSOLUTE.

For this purpose, I tried to use part of this pipeline on GitHub (unsupported/combine_tracks_postprocessing_cnv/combine_tracks.wdl) to convert the data with the function "convert_model_segments_to_alleliccapseg". I use as input the.af.param and.cr.seg files from ModelSegments.

However, I don't really understand in the script the utility of the following parameters to use as threshold values (Float ? maf90_threshold and Int ? min_hets_acs_results)?

Is there a better way to do this conversion?

Thank you

↧

Using JEXL to apply hard filters or select variants based on annotation values

August 1, 2012, 4:04 pm

≫ Next: GATK4: CollectAlleleCount output & Model ModelSegments

≪ Previous: Converting ModelSegments outputs for ABSOLUTE

1. JEXL in a nutshell

JEXL stands for Java EXpression Language. It's not a part of the GATK as such; it's a software library that can be used by Java-based programs like the GATK. It can be used for many things, but in the context of the GATK, it has one very specific use: making it possible to operate on subsets of variants from VCF files based on one or more annotations, using a single command. This is typically done with walkers such as VariantFiltration and SelectVariants.

2. Basic structure of JEXL expressions for use with the GATK

In this context, a JEXL expression is a string (in the computing sense, i.e. a series of characters) that tells the GATK which annotations to look at and what selection rules to apply.

JEXL expressions contain three basic components: keys and values, connected by operators. For example, in this simple JEXL expression which selects variants whose quality score is greater than 30:

"QUAL > 30.0"

QUAL is a key: the name of the annotation we want to look at
30.0 is a value: the threshold that we want to use to evaluate variant quality against
> is an operator: it determines which "side" of the threshold we want to select

The complete expression must be framed by double quotes. Within this, keys are strings (typically written in uppercase or CamelCase), and values can be either strings, numbers or booleans (TRUE or FALSE) -- but if they are strings the values must be framed by single quotes, as in the following example:

"MY_STRING_KEY == 'foo'"

3. Evaluation on multiple annotations

You can build expressions that calculate a metric based on two separate annotations, for example if you want to select variants for which quality (QUAL) divided by depth of coverage (DP) is below a certain threshold value:

"QUAL / DP < 10.0"

You can also join multiple conditional statements with logical operators, for example if you want to select variants that have both sufficient quality (QUAL) and a certain depth of coverage (DP):

"QUAL > 30.0 && DP == 10"

where && is the logical "AND".

In the case where you want to select variants that have at least one of several conditions fulfilled, provide each expression separately:

"QD < 2.0" \
    "ReadPosRankSum < -20.0" \
    "FS > 200.0"

To be on the safe, do not use compound expressions with the logical "OR" || as a missing annotation will negate the entire expression.

4. Filtering on sample/genotype-level properties

You can also filter individual samples/genotypes in a VCF based on information from the FORMAT field. Variant Filtration will add the sample-level FT tag to the FORMAT field of filtered samples. Note however that this does not affect the record's FILTER tag. This is still a work in progress and isn't quite as flexible and powerful yet as we'd like it to be. For now, you can filter based on most fields as normal (e.g. GQ < 5.0), but the GT (genotype) field is an exception. We have put in convenience methods to enable filtering out heterozygous calls (isHet == 1), homozygous-reference calls (isHomRef == 1), and homozygous-variant calls (isHomVar == 1).

5. Important caveats

Sensitivity to case and type

You're probably used to case being important (whether letters are lowercase or UPPERCASE) but now you need to also pay attention to the type of value that is involved -- for example, numbers are differentiated between integers and floats (essentially, non-integers). These points are especially important to keep in mind:

Case
Currently, VCF INFO field keys are case-sensitive. That means that if you have a QUAL field in uppercase in your VCF record, the system will not recognize it if you write it differently (Qual, qual or whatever) in your JEXL expression.
Type
The types (i.e. string, integer, non-integer, floating point or boolean) used in your expression must be exactly the same as that of the value you are trying to evaluate. In other words, if you have a QUAL field with non-integer values (e.g. 45.3) and your filter expression is written as an integer (e.g. "QUAL < 50"), the system will throw a hissy fit (specifically, a Java exception, e.g. a NumberFormatException for numerical type mismatches).

Complex queries

We highly recommend that complex expressions involving multiple AND/OR operations be split up into separate expressions whenever possible to avoid confusion. If you are using complex expressions, make sure to test them on a panel of different sites with several combinations of yes/no criteria.

6. More complex JEXL magic

Note that this last part is fairly advanced and not for the faint of heart. To be frank, it's also explained rather more briefly than the topic deserves. But if there's enough demand for this level of usage (click the "view in forum" link and leave a comment) we'll consider producing a full-length tutorial.

Introducing the VariantContext object

When you use SelectVariants with JEXL, what happens under the hood is that the program accesses something called the VariantContext, which is a representation of the variant call with all its annotation information. The VariantContext is technically not part of GATK; it's part of the variant library included within the Picard tools source code, which GATK uses for convenience.

The reason we're telling you about this is that you can actually make more complex queries than what the GATK offers convenience functions for, provided you're willing to do a little digging into the VariantContext methods. This will allow you to leverage the full range of capabilities of the underlying objects from the command line.

In a nutshell, the VariantContext is available through the vc variable, and you just need to add method calls to that variable in your command line. The bets way to find out what methods are available is to read the VariantContext documentation on the Picard tools source code repository (on SourceForge), but we list a few examples below to whet your appetite.

Using VariantContext directly

For example, suppose I want to use SelectVariants to select all of the sites where sample NA12878 is homozygous-reference. This can be accomplished by assessing the underlying VariantContext as follows:

java -jar GenomeAnalysisTK.jar -T SelectVariants \
        -R reference.fasta \
        -V variants.vcf \
        -select 'vc.getGenotype("NA12878").isHomRef()'

Groovy, right? Now here's a more sophisticated example of JEXL expression that finds all novel variants in the total set with allele frequency > 0.25 but not 1, is not filtered, and is non-reference in 01-0263 sample:

! vc.getGenotype("01-0263").isHomRef() && (vc.getID() == null || vc.getID().equals(".")) && AF > 0.25 && AF < 1.0 && vc.isNotFiltered() && vc.isSNP() -o 01-0263.high_freq_novels.vcf -sn 01-0263

Using the VariantContext to evaluate boolean values

The classic way of evaluating a boolean goes like this:

java -Xmx4g -jar GenomeAnalysisTK.jar -T SelectVariants \
        -R reference.fasta \
        -V my.vcf \
        -select 'DB'

But you can also use the VariantContext object like this:

java -Xmx4g -jar GenomeAnalysisTK.jar -T SelectVariants \
        -R reference.fasta \
        -V my.vcf \
        -select 'vc.hasAttribute("DB")'

Using VariantContext to access annotations in multiallelic sites

The order of alleles in the VariantContext object is not guaranteed to be the same as in the VCF output, so accessing the AF by an index derived from a scrambled alleles array is dangerous. However! If we have the sample genotypes, there's a workaround:

java -jar GenomeAnalysisTK.jar -T SelectVariants  \
        -R reference.fasta  \
        -V multiallelics.vcf  \
        -select 'vc.hasGenotypes() && vc.getCalledChrCount(vc.getAltAlleleWithHighestAlleleCount())/(1.0*vc.getCalledChrCount()) > 0.1' -o multiHighAC.vcf

The odd 1.0 is there because otherwise we're dividing two integers, which will always yield 0. The vc.hasGenotypes() is extra error checking. This might be slow for large files, but we could use something like this if performance is a concern:

java -jar GenomeAnalysisTK.jar -T SelectVariants \
        -R reference.fasta \
        -V multiallelics.vcf \
         -select 'vc.isBiallelic() ? AF > 0.1 : vc.hasGenotypes() && vc.getCalledChrCount(vc.getAltAlleleWithHighestAlleleCount())/(1.0*vc.getCalledChrCount()) > 0.1' -o multiHighAC.vcf

Where hopefully the ternary expression shortcuts the extra vc calls for all the biallelics.

Using JEXL to evaluate arrays

Sometimes you might want to write a JEXL expression to evaluate e.g. the AD (allelic depth) field in the FORMAT column. However, the AD is technically not an integer; rather it is a list (array) of integers. One can evaluate the array data using the "." operator. Here's an example:

java -jar GenomeAnalysisTK.jar -T SelectVariants \
        -R reference.fasta \
        -V variants.vcf \
        -select 'vc.getGenotype("NA12878").getAD().0 > 10'

If you would like to select sites where the alternate allele frequency is greater than 50%, you can use the following expression:

java -jar GenomeAnalysisTK.jar -T SelectVariants \
        -R reference.fasta \
        -V variants.vcf \
        -select vc.getGenotype("NA12878").getAD().1 / vc.getGenotype("NA12878").getDP() > 0.50

↧

GATK4: CollectAlleleCount output & Model ModelSegments

January 19, 2018, 6:31 am

≫ Next: GATK3.8 SNP near the INDEL region with something different

≪ Previous: Using JEXL to apply hard filters or select variants based on annotation values

Hi everyone,
I am trying to run the CNV discovery pipeline and I have noticed that the header of sample.allelicCounts.tsv (produced by CollectAlleleCount) gives problems when used to rum ModelSegments.

Indeed, ModelSegments gives me this error:
"A USER ERROR has occurred: Bad input: Bad header in file. Not all mandatory columns are present. Missing: POSITION, REF_COUNT, REF_NUCLEOTIDE, ALT_NUCLEOTIDE, ALT_COUNT"

And I think it's because of the CollectAlleleCount tsv header format:
"CONTIG POSITION REF_COUNT ALT_COUNT REF_NUCLEOTIDE ALT_NUCLEOTIDE"

Is there any specific option to modify the column order? Can I directly parse the file?

Regards,

Alessandra

↧

GATK3.8 SNP near the INDEL region with something different

June 27, 2019, 11:53 pm

≫ Next: What is the difference between --truth-sensitivity-tranche and --ts-filter-level ?

≪ Previous: GATK4: CollectAlleleCount output & Model ModelSegments

Hi,
I filter the SNP by using GATK3.8. When I check the SNP manually, I found this site with AD 15,0 and GT 0/1. I'm not sure about this result. Is it wrong ? or could I change the GT manually?
Thanks in advance

↧

What is the difference between --truth-sensitivity-tranche and --ts-filter-level ?

May 14, 2019, 1:35 am

≫ Next: From Python Magic to embedded IGV: A closer look at GATK tutorial notebooks

≪ Previous: GATK3.8 SNP near the INDEL region with something different

I'm using GATK v4.0.3.0.

I'm wanting to use the recommended ApplyVQSR --ts-filter-level values, as specified at the end of GATK's document #1259 (albeit this document was written for GATK3, but I assume the same recommendations apply to GATK4)

Does that mean I need to specify the same values for VariantRecalibrator's --truth-sensitivity-tranche? So for example:
VariantRecalibrator --truth-sensitivity-tranche 99.5 --mode SNP
VariantRecalibrator --truth-sensitivity-tranche 99.0 --mode INDEL
ApplyVQSR --ts-filter-level 99.5 --mode SNP
ApplyVQSR --ts-filter-level 99.0 --mode INDEL

↧

From Python Magic to embedded IGV: A closer look at GATK tutorial notebooks

June 28, 2019, 7:31 am

≫ Next: How are splice sites annotated?

≪ Previous: What is the difference between --truth-sensitivity-tranche and --ts-filter-level ?

Earlier this week, I made a big deal about how we plan to develop all of our GATK tutorials as Jupyter Notebooks in Terra going forward. Today I'd like to offer you a concrete look at what we like about using notebooks for GATK tutorials.

I was planning to just walk you through a couple of notebooks in one of our workshop workspaces, but then decided to make a custom workspace and notebook to show you what I mean without the complexity of the full-length tutorials. It's part highlights, featuring a couple of my favorite tutorial scenarios from the workshops that are fairly simple yet quite effective, and part sneak preview of the newest version of the tutorials, which boast cool new features and will be unveiled at the next workshop (Cambridge in July). Oh, and part explainer on what exactly are Jupyter Notebooks anyway?

Overall you can consider this mini-tutorial a stepping stone to being able to use the workshop tutorial workspaces without needing to actually attend a workshop. The workspace docs and the notebook itself both have a lot of explanations about how things work and how to use them in your pursuit of deeper understanding of GATK. So I don't feel the need to go on and on about it here (for once). But I will mention, in case you're on the fence about whether to spend 5 whole minutes checking out the workspace (add 15 to 20 minutes to actually work through the full notebook), it involves running GATK commands, streaming files, and viewing data in IGV -- all without ever leaving the warm embrace of the notebook.

Actually I lied, I will go on a bit because there are two standout features that I want to call explicitly. One is Python Magic, which allows us to run commands as if we were in the terminal, but from within the flow of the notebook itself. If you thought you could only run Python code in there, think again! You can run anything that you can install on the notebook runtime (which is just about anything). You can also use it to embed R code, which comes in handy in one of our filtering tutorials, because we love Python as a home base but make extensive use of the R library ggplot. (Or you can switch the entire notebook to an R kernel on the fly but that leads to some nervousness about state so I'd rather use the magic, personally.)

The other waffle-worthy feature is IGV integration: you can embed an interactive IGV window to view and explore your data directly from within the notebook. Until very recently we had to load files into desktop IGV, which involved a lot of copy-pasting of cloud storage file paths, and some context switching. With embedded IGV there's none of that. It's not as full-featured as the desktop version (and sometimes you may still prefer to use desktop IGV), but the notebook integration has practically all the functionality I ever use. And it's just so cool to have what amounts to embedded interactive figures right there with the rest of the commands and explanations. Seriously, I love the IGV integration so much, it's hard to put into words.

All this to say, I heartily recommend you check out this mini-tutorial workspace, as it will give you a very concrete set of examples of how we're building out our tutorials and empower you to work through our workshop workspaces on your own. And as always we'd love to get feedback from all of you about the current crop of tutorials and what you'd like us to prioritize next.

Go to http://app.terra.bio and you'll be asked to log in with a Google identity. If you don't have one already, you can create one, and choose to either create a new Gmail account for it or associate your new Google identity with your existing email address. See this article for step-by-step instructions on how to register if needed. Once you've logged in, look for the big green banner at the top of the screen and click "Start trial" to take advantage of the free credits program. As a reminder, access to Terra is free but Google charges you for compute and storage; the credits (a $300 value) will allow you to try out the resources I'm describing here for free. To clone a workspace, open it, expand the workspace action menu (three-dot icon, top right) and select the "Clone" option. In the cloning dialog, select the billing project we created for you with your free credits. The resulting workspace clone belongs to you. Have fun!

↧

How are splice sites annotated?

June 25, 2014, 12:07 pm

≫ Next: How can I prevent the file header from showing up in gigantic font?

≪ Previous: From Python Magic to embedded IGV: A closer look at GATK tutorial notebooks

In Oncotator, the variant classification of SPLICE_SITE occurs when a variant is within two bases of a splice site on either the exon or intron side.

To determine whether a specific SPLICE_SITE is exon or intron, and to get the exact variant classification, use the secondary_vc annotation.

↧

How can I prevent the file header from showing up in gigantic font?

January 17, 2017, 7:52 pm

≫ Next: Throwing Error: Invalid or corrupt jarfile while using GATK wrapper version 4.1.2.0

≪ Previous: How are splice sites annotated?

Hi. My question is, when I post to the forum, some parts of my post become huge, e.g. file headers or error messages. I'm showing a truncated example below of a VCF header. How can I prevent this from happening and show the copy-pasted blocks in normal font?

fileformat=VCFv4.2

...

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878

↧

Throwing Error: Invalid or corrupt jarfile while using GATK wrapper version 4.1.2.0

July 1, 2019, 2:41 am

≫ Next: GATK 4 IlluminaBaseCallsToSam Error

≪ Previous: How can I prevent the file header from showing up in gigantic font?

Hello All,
I am using GATK wrapper version: gatk-4.1.2.0, and while trying use it on HaplotypeCaller tool as following,
./gatk HaplotypeCaller --help it is throwing the following error message,

Error: Invalid or corrupt jarfile /cluster/work/software/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar

**My java version is **
openjdk version "1.7.0_161"
OpenJDK Runtime Environment (Zulu 7.21.0.3-linux64) (build 1.7.0_161-b14)
OpenJDK 64-Bit Server VM (Zulu 7.21.0.3-linux64) (build 24.161-b14, mixed mode)

↧

GATK 4 IlluminaBaseCallsToSam Error

July 1, 2019, 3:33 am

≫ Next: CreateReadCountPanelOfNormals Errors:java.lang.IllegalArgumentException: TableColumnCollection must

≪ Previous: Throwing Error: Invalid or corrupt jarfile while using GATK wrapper version 4.1.2.0

Dear GATK Team,

I am getting the following error:

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-905"
INFO 2019-07-01 15:41:32 CbclReader Processing tile 2676
INFO 2019-07-01 15:41:32 CbclReader Processing tile 2677
INFO 2019-07-01 15:41:32 CbclReader Processing tile 2678

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-919"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-915"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-920"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-917"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-928"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-924"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-921"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-931"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-925"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-927"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-923"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-929"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-926"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-922"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-933"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-934"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-935"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-930"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-932"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-4-thread-936"

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-3-thread-1"
[Mon Jul 01 15:41:32 IST 2019] picard.illumina.IlluminaBasecallsToSam done. Elapsed time: 0.91 minutes.
Runtime.totalMemory()=9960947712
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
picard.PicardException: Reading executor had exceptions. There were 0 tasks were still running or queued and have been cancelled.
at picard.illumina.NewIlluminaBasecallsConverter.doTileProcessing(NewIlluminaBasecallsConverter.java:170)
at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:276)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
Caused by: picard.PicardException: Expected cbcl file for surface 1 cycle 319 but it was not found.
at picard.illumina.parser.readers.CbclReader.readSurfaceTile(CbclReader.java:121)
at picard.illumina.parser.readers.CbclReader.(CbclReader.java:102)
at picard.illumina.parser.NewIlluminaDataProvider.(NewIlluminaDataProvider.java:37)
at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:269)
at picard.illumina.NewIlluminaBasecallsConverter$TileProcessor.run(NewIlluminaBasecallsConverter.java:232)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Kindly help us to fix this issues.

Krithika S

↧

CreateReadCountPanelOfNormals Errors:java.lang.IllegalArgumentException: TableColumnCollection must

July 1, 2019, 5:49 am

≫ Next: questions of GenotypeGVCFs result

≪ Previous: GATK 4 IlluminaBaseCallsToSam Error

I met an error when i used CreateReadCountPanelOfNormals, I had the S*.count.hdf5 by CollectReadCounts and everything seemed ok.

gatk --java-options "-Xmx5g" CreateReadCountPanelOfNormals --input gatk_commonCnv/S07.count.hdf5 --input gatk_commonCnv/S08.count.hdf5 --input gatk_commonCnv/S09.count.hdf5 --input gatk_commonCnv/S10.count.hdf5 --input gatk_commonCnv/S11.count.hdf5 --input gatk_commonCnv/S12.count.hdf5 --input gatk_commonCnv/S13.count.hdf5 --input gatk_commonCnv/S14.count.hdf5 --input gatk_commonCnv/S15.count.hdf5 --input gatk_commonCnv/S16.count.hdf5 --input gatk_commonCnv/S17.count.hdf5 --input gatk_commonCnv/S18.count.hdf5 --input gatk_commonCnv/S19.count.hdf5 --input gatk_commonCnv/S20.count.hdf5 --input gatk_commonCnv/S21.count.hdf5 --input gatk_commonCnv/S22.count.hdf5 --input gatk_commonCnv/S23.count.hdf5 --input gatk_commonCnv/S24.count.hdf5 --input gatk_commonCnv/S25.count.hdf5 --input gatk_commonCnv/S26.count.hdf5 --input gatk_commonCnv/S49.count.hdf5 --input gatk_commonCnv/S50.count.hdf5 --input gatk_commonCnv/S51.count.hdf5 --input gatk_commonCnv/S52.count.hdf5 --input gatk_commonCnv/S53.count.hdf5 --input gatk_commonCnv/S54.count.hdf5 --input gatk_commonCnv/S55.count.hdf5 --input gatk_commonCnv/S56.count.hdf5 --annotated-intervals call_region/preprocessed.filter.interval_list --output gatk_somaticCnv/normal.pon.hdf5
Using GATK jar /home/my/anaconda2/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar
Running:
/home/my/anaconda2/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5g -jar /home/my/anaconda2/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar CreateReadCountPanelOfNormals --input gatk_commonCnv/S07.count.hdf5 --input gatk_commonCnv/S08.count.hdf5 --input gatk_commonCnv/S09.count.hdf5 --input gatk_commonCnv/S10.count.hdf5 --input gatk_commonCnv/S11.count.hdf5 --input gatk_commonCnv/S12.count.hdf5 --input gatk_commonCnv/S13.count.hdf5 --input gatk_commonCnv/S14.count.hdf5 --input gatk_commonCnv/S15.count.hdf5 --input gatk_commonCnv/S16.count.hdf5 --input gatk_commonCnv/S17.count.hdf5 --input gatk_commonCnv/S18.count.hdf5 --input gatk_commonCnv/S19.count.hdf5 --input gatk_commonCnv/S20.count.hdf5 --input gatk_commonCnv/S21.count.hdf5 --input gatk_commonCnv/S22.count.hdf5 --input gatk_commonCnv/S23.count.hdf5 --input gatk_commonCnv/S24.count.hdf5 --input gatk_commonCnv/S25.count.hdf5 --input gatk_commonCnv/S26.count.hdf5 --input gatk_commonCnv/S49.count.hdf5 --input gatk_commonCnv/S50.count.hdf5 --input gatk_commonCnv/S51.count.hdf5 --input gatk_commonCnv/S52.count.hdf5 --input gatk_commonCnv/S53.count.hdf5 --input gatk_commonCnv/S54.count.hdf5 --input gatk_commonCnv/S55.count.hdf5 --input gatk_commonCnv/S56.count.hdf5 --annotated-intervals call_region/preprocessed.filter.interval_list --output gatk_somaticCnv/normal.pon.hdf5
20:02:17.211 WARN SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly
20:02:32.828 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/chendan/anaconda2/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 01, 2019 8:02:35 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
20:02:35.455 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
20:02:35.455 INFO CreateReadCountPanelOfNormals - The Genome Analysis Toolkit (GATK) v4.1.2.0
20:02:35.456 INFO CreateReadCountPanelOfNormals - For support and documentation go to https://software.broadinstitute.org/gatk/
20:02:35.456 INFO CreateReadCountPanelOfNormals - Executing as chendan@login01.local on Linux v3.10.0-693.5.2.el7.x86_64 amd64
20:02:35.456 INFO CreateReadCountPanelOfNormals - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
20:02:35.457 INFO CreateReadCountPanelOfNormals - Start Date/Time: July 1, 2019 8:02:27 PM CST
20:02:35.457 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
20:02:35.457 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
20:02:35.458 INFO CreateReadCountPanelOfNormals - HTSJDK Version: 2.19.0
20:02:35.458 INFO CreateReadCountPanelOfNormals - Picard Version: 2.19.0
20:02:35.458 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.COMPRESSION_LEVEL : 2
20:02:35.458 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
20:02:35.458 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
20:02:35.458 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
20:02:35.458 INFO CreateReadCountPanelOfNormals - Deflater: IntelDeflater
20:02:35.459 INFO CreateReadCountPanelOfNormals - Inflater: IntelInflater
20:02:35.459 INFO CreateReadCountPanelOfNormals - GCS max retries/reopens: 20
20:02:35.459 INFO CreateReadCountPanelOfNormals - Requester pays: disabled
20:02:35.459 INFO CreateReadCountPanelOfNormals - Initializing engine
20:02:35.459 INFO CreateReadCountPanelOfNormals - Done initializing engine
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/07/01 20:03:21 INFO SparkContext: Running Spark version 2.2.0
19/07/01 20:03:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/07/01 20:03:45 INFO SparkContext: Submitted application: CreateReadCountPanelOfNormals
19/07/01 20:03:52 INFO SecurityManager: Changing view acls to: chendan
19/07/01 20:03:52 INFO SecurityManager: Changing modify acls to: chendan
19/07/01 20:03:52 INFO SecurityManager: Changing view acls groups to:
19/07/01 20:03:52 INFO SecurityManager: Changing modify acls groups to:
19/07/01 20:03:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(chendan); groups with view permissions: Set(); users with modify permissions: Set(chendan); groups with modify permissions: Set()
19/07/01 20:04:14 INFO Utils: Successfully started service 'sparkDriver' on port 35819.
19/07/01 20:04:15 INFO SparkEnv: Registering MapOutputTracker
19/07/01 20:04:15 INFO SparkEnv: Registering BlockManagerMaster
19/07/01 20:04:15 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/07/01 20:04:15 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/07/01 20:04:15 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f5c9bd81-f8bf-4775-ad1e-6f2213f7e7f0
19/07/01 20:04:15 INFO MemoryStore: MemoryStore started with capacity 2.5 GB
19/07/01 20:04:19 INFO SparkEnv: Registering OutputCommitCoordinator
19/07/01 20:04:31 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/07/01 20:04:31 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.0.100:4040
19/07/01 20:04:33 INFO Executor: Starting executor ID driver on host localhost
19/07/01 20:04:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46344.
19/07/01 20:04:33 INFO NettyBlockTransferService: Server created on 10.0.0.100:46344
19/07/01 20:04:33 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/07/01 20:04:33 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.100, 46344, None)
19/07/01 20:04:33 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.0.100:46344 with 2.5 GB RAM, BlockManagerId(driver, 10.0.0.100, 46344, None)
19/07/01 20:04:33 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.100, 46344, None)
19/07/01 20:04:33 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.100, 46344, None)
20:04:39.309 INFO CreateReadCountPanelOfNormals - Spark verbosity set to INFO (see --spark-verbosity argument)
19/07/01 20:04:39 INFO HDF5Library: Trying to load HDF5 library from:
jar:file:/home/my/anaconda2/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar!/org/broadinstitute/hdf5/libjhdf5.2.11.0.so
19/07/01 20:04:44 INFO H5: HDF5 library:
19/07/01 20:04:44 INFO H5: successfully loaded.
20:04:46.606 INFO CreateReadCountPanelOfNormals - Retrieving intervals from first read-counts file (gatk_commonCnv/S07.count.hdf5)...
20:04:59.226 INFO CreateReadCountPanelOfNormals - Reading and validating annotated intervals...
19/07/01 20:05:05 INFO SparkUI: Stopped Spark web UI at http://10.0.0.100:4040
19/07/01 20:05:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/07/01 20:05:05 INFO MemoryStore: MemoryStore cleared
19/07/01 20:05:05 INFO BlockManager: BlockManager stopped
19/07/01 20:05:05 INFO BlockManagerMaster: BlockManagerMaster stopped
19/07/01 20:05:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/07/01 20:05:05 INFO SparkContext: Successfully stopped SparkContext
20:05:05.721 INFO CreateReadCountPanelOfNormals - Shutting down engine
[July 1, 2019 8:05:05 PM CST] org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals done. Elapsed time: 2.64 minutes.
Runtime.totalMemory()=623902720
java.lang.IllegalArgumentException: TableColumnCollection must contain standard columns: [CONTIG, START, END].
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.tools.copynumber.formats.collections.AnnotatedIntervalCollection.getAnnotationKeys(AnnotatedIntervalCollection.java:111)
at org.broadinstitute.hellbender.tools.copynumber.formats.collections.AnnotatedIntervalCollection.(AnnotatedIntervalCollection.java:79)
at org.broadinstitute.hellbender.tools.copynumber.arguments.CopyNumberArgumentValidationUtils.validateAnnotatedIntervals(CopyNumberArgumentValidationUtils.java:130)
at org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals.runPipeline(CreateReadCountPanelOfNormals.java:276)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
19/07/01 20:05:05 INFO ShutdownHookManager: Shutdown hook called
19/07/01 20:05:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-1b686777-b88f-4b62-bcd6-dd79490c9359

↧

questions of GenotypeGVCFs result

July 1, 2019, 6:18 am

≫ Next: SAM/BAM optional tags

≪ Previous: CreateReadCountPanelOfNormals Errors:java.lang.IllegalArgumentException: TableColumnCollection must

Hi,
I have a problem about my result GenotypeGVCFs by GATK3.8 . And I'm not sure why there was no gt of the site, which contain 49 ref reads, like 'GT:AD:DP ./.:49,0:49'. Shall I change the result manually ?

↧

SAM/BAM optional tags

July 1, 2019, 1:02 pm

≫ Next: GATK - Exception in thread "main" java.lang.OutOfMemoryError

≪ Previous: questions of GenotypeGVCFs result

Hello,
Are the SAM/BAM optional tags (such as AS or XS), produced by bwa mem tool, used by the GATK tools (BQSR and HaplotypeCaller) in the subsequent steps of the pipeline?

↧

GATK - Exception in thread "main" java.lang.OutOfMemoryError

July 1, 2019, 1:47 pm

≫ Next: PoN causing missed somatic variant call with Mutect2

≪ Previous: SAM/BAM optional tags

I am currently trying to combine 40 individuals into one big file (using 16G of memory) and this is the command line that I am using for this:

echo "Starting run at: `date`"
gatk CombineGVCFs \
-R Gac-HiC_revised_genome_assembly.fa \
--variant variant_MUI004.vcf \
--variant variant_MUI006.vcf \
--variant variant_MUI009.vcf \
--variant variant_MUI010.vcf \
--variant variant_MUI014.vcf \
--variant variant_MUI017.vcf \
--variant variant_MUI024.vcf \
--variant variant_MUI025.vcf \
--variant variant_MUI027.vcf \
--variant variant_MUI028.vcf \
--variant variant_MUI029.vcf \
--variant variant_MUI030.vcf \
--variant variant_MUI032.vcf \
--variant variant_MUI035.vcf \
--variant variant_MUI036.vcf \
--variant variant_MUI037.vcf \
--variant variant_MUI038.vcf \
--variant variant_MUI039.vcf \
--variant variant_MUI040.vcf \
--variant variant_MUI041.vcf \
--variant variant_MUI044.vcf \
--variant variant_MUI045.vcf \
--variant variant_MUI047.vcf \
--variant variant_MUI051.vcf \
--variant variant_MUI052.vcf \
--variant variant_MUI057.vcf \
--variant variant_MUI063.vcf \
--variant variant_MUI066.vcf \
--variant variant_MUI067.vcf \
--variant variant_MUI068.vcf \
--variant variant_MUI071.vcf \
--variant variant_MUI072.vcf \
--variant variant_MUI073.vcf \
--variant variant_MUI074.vcf \
--variant variant_MUI076.vcf \
--variant variant_MUI077.vcf \
--variant variant_MUI079.vcf \
--variant variant_MUI080.vcf \
--variant variant_MUI081.vcf \
--variant variant_MUI083.vcf \
-O cohort.MUIsamples_threespine_alignment.vcf \

echo "Program finished with exit code $? at: `date`"

The command runs fine for about 2 mins and then it shuts down. When looking at the output file, I get an exit code 1, and I can see the following error mentioned:

[June 30, 2019 1:10:15 AM EDT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 2.19 minutes.
Runtime.totalMemory()=17351741440
Exception in thread "main" java.lang.OutOfMemoryError
at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161)
at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuilder.append(StringBuilder.java:190)
at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:340)
at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:356)
at htsjdk.tribble.readers.SynchronousLineReader.readLine(SynchronousLineReader.java:51)
at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:24)
at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:11)
at htsjdk.samtools.util.AbstractIterator.next(AbstractIterator.java:57)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:373)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:354)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:315)
at org.broadinstitute.hellbender.engine.MultiVariantDataSource$1.next(MultiVariantDataSource.java:394)
at org.broadinstitute.hellbender.engine.MultiVariantDataSource$1.next(MultiVariantDataSource.java:379)
at htsjdk.samtools.util.PeekableIterator.advance(PeekableIterator.java:71)
at htsjdk.samtools.util.PeekableIterator.next(PeekableIterator.java:57)
at htsjdk.samtools.util.MergingIterator.next(MergingIterator.java:107)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
:

....

Using GATK jar /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/gatk/4.1.0.0/gatk-package-4.1.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/gatk/4.1.0.0/gatk-package-4.1.0.0-local.jar CombineGVCFs -R Gac-HiC_revised_genome_assembly.fa --variant variant_MUI004.vcf --variant variant_MUI006.vcf --variant variant_MUI009.vcf --variant variant_MUI010.vcf --variant variant_MUI014.vcf --variant variant_MUI017.vcf --variant variant_MUI024.vcf --variant variant_MUI025.vcf --variant variant_MUI027.vcf --variant variant_MUI028.vcf --variant variant_MUI029.vcf --variant variant_MUI030.vcf --variant variant_MUI032.vcf --variant variant_MUI035.vcf --variant variant_MUI036.vcf --variant variant_MUI037.vcf --variant variant_MUI038.vcf --variant variant_MUI039.vcf --variant variant_MUI040.vcf --variant variant_MUI041.vcf --variant variant_MUI044.vcf --variant variant_MUI045.vcf --variant variant_MUI047.vcf --variant variant_MUI051.vcf --variant variant_MUI052.vcf --variant variant_MUI057.vcf --variant variant_MUI063.vcf --variant variant_MUI066.vcf --variant variant_MUI067.vcf --variant variant_MUI068.vcf --variant variant_MUI071.vcf --variant variant_MUI072.vcf --variant variant_MUI073.vcf --variant variant_MUI074.vcf --variant variant_MUI076.vcf --variant variant_MUI077.vcf --variant variant_MUI079.vcf --variant variant_MUI080.vcf --variant variant_MUI081.vcf --variant variant_MUI083.vcf -O cohort.MUIsamples_threespine_alignment.vcf
Program finished with exit code 1 at: Sun Jun 30 01:10:16 EDT 2019

I thought that by adding more memory to the specs of the job submission the problem would be solved (this is the solution I found in similar post in the GATK forum) but it does not matter how much memory I add, the result is virtually always the same: Out of Memory. I wonder if maybe there is a way to ask GATK to create a temporal folder or something that would help the program not to shut down due to lack of memory but I am clueless as to what this can be.

Any help would be greatly appreciated!

Thanks!

↧

PoN causing missed somatic variant call with Mutect2

July 1, 2019, 5:19 pm

≫ Next: how to use matched-normal in germline calling with GATK3 ver.

≪ Previous: GATK - Exception in thread "main" java.lang.OutOfMemoryError

Hi, I've been running Mutect2 for somatic variant calling and I decided to created a panel of normals (PoN) using "CreateSomaticPanelOfNormals". I followed the instructions for GATK 4.1.0.0 as that is the version I've been using.

When going through the differences in the somatic variants called by Mutect2 with and without the PoN I noticed a case I was not expecting: using the PoN, Mutect2 did not call a previously called somatic variant because it was located in the same position as one of the germline variants caught by the PoN (present in 4 of the ~20 samples used in its creation). I tried re-running Mutect2 with the parameter "--genotypePonSites true" hoping that maybe for this case the variant would not be filtered but as expected by the parameter description it was.

I'll leave a screenshot of IGV where on the bottom track is the missed somatic variant and on the track above that is the germline variant (present in 3 other samples that were also used to create the PoN).

I have checked the release notes for newer GATK versions but I found none that seemed to address this case. Could I be missing something? Is the mentioned behaviour expected? If it is, is there a plan to change it in a future release?

Thanks

↧

how to use matched-normal in germline calling with GATK3 ver.

July 1, 2019, 10:45 pm

≫ Next: IO Error while running MergeBamAlignment on Terra

≪ Previous: PoN causing missed somatic variant call with Mutect2

Hello.

I have 30 matched-normal bladder tumor(bam).

According to definition of germline variant, I use normal bam only to call germline variant using haplotypecaller. But there is no step to filter somatic variant. I guess it's necessary step to filter somatic variants among germline variants.

Steps I did--------------------------------------
1) Haplotypecaller; variant calling with gvcf mode
2) GenotypeGVCFs; Joint vcfs
3) VariantRecalibrator and ApplyRecalibration; Filtering variants.

Thanks in advance.

↧