Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

GATK 3.7 HaplotypeCaller NullPointerException in removeAltAllelesIfTooManyGenotypes

$
0
0

HaplotypeCaller in GATK 3.7 (3.7-0-g56f2c1a) is throwing a NullPointerException in some cases. See below for log output from a failing run.

It looks to me like the call to .get() in the practicalAlleleCountForPloidy HashMap must returning null for some reason (and the unboxing into an int is then causing the null pointer exception): https://github.com/broadgsa/gatk-protected/blob/master/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCallerGenotypingEngine.java#L360

Given that the immediately preceding call is to practicalAlleleCountForPloidy.putIfAbsent(), either the key for the given ploidy must already be in the HashMap with value null or the calculation from GenotypeLikelihoodCalculators.computeMaxAcceptableAlleleCount(ploidy, maxGenotypeCountToEnumerate) is returning null.

A quick scan of the code does not indicate any obvious problems here. I'll see if I can add some debug printing and re-run on the problematic data to clarify the situation.

['-T', 'HaplotypeCaller', '--no_cmdline_in_header', '-R', u'/keep/d527a0b11143ebf18be6c52ff6c09552+2163/hs37d5.fa', '-I', u'/keep/c5e28ac0e8014f6117792f83e031aea8+21780/20643_7.cram', '-L', u'/keep/85abb468fc85aece80e33396c48fb7d0+94/hs37d5.dict.159_of_200.interval_list', '-A', 'StrandAlleleCountsBySample', '-A', 'StrandBiasBySample', '-nct', '4', '--emitRefConfidence', 'GVCF', '--variant_index_type', 'LINEAR', '--variant_index_parameter', '128000', '-o', u'/tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz', '-l', 'INFO']
 INFO  13:31:41,104 HelpFormatter - --------------------------------------------------------------------------------
 INFO  13:31:41,110 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-g56f2c1a, Compiled 2017/01/03 11:50:40
 INFO  13:31:41,110 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
 INFO  13:31:41,110 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
 INFO  13:31:41,111 HelpFormatter - [Tue Jan 03 13:31:41 UTC 2017] Executing on Linux 3.13.0-85-generic amd64
 INFO  13:31:41,111 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14
 INFO  13:31:41,118 HelpFormatter - Program Args: -T HaplotypeCaller --no_cmdline_in_header -R /keep/d527a0b11143ebf18be6c52ff6c09552+2163/hs37d5.fa -I /keep/c5e28ac0e8014f6117792f83e031aea8+21780/20643_7.cram -L /keep/85abb468fc85aece80e33396c48fb7d0+94/hs37d5.dict.159_of_200.interval_list -A StrandAlleleCountsBySample -A StrandBiasBySample -nct 4 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o /tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz -l INFO
 INFO  13:31:41,125 HelpFormatter - Executing as crunch@f1857b5c4c58 on Linux 3.13.0-85-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14.
 INFO  13:31:41,126 HelpFormatter - Date/Time: 2017/01/03 13:31:41
 INFO  13:31:41,126 HelpFormatter - --------------------------------------------------------------------------------
 INFO  13:31:41,126 HelpFormatter - --------------------------------------------------------------------------------
 WARN  13:31:41,135 GATKVCFUtils - Naming your output file using the .g.vcf extension will automatically set the appropriate values  for --variant_index_type and --variant_index_parameter
 WARN  13:31:41,136 GATKVCFUtils - Creating Tabix index for /tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz, ignoring user-specified index type and parameter
 INFO  13:31:41,178 GenomeAnalysisEngine - Strictness is SILENT
 INFO  13:31:41,910 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
 INFO  13:31:41,920 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
 INFO  13:31:43,684 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1.76
 INFO  13:31:44,363 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
 INFO  13:31:44,401 IntervalUtils - Processing 15618872 bp from intervals
 INFO  13:31:44,422 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 4 CPU thread(s) for each of 1 data thread(s), of 40 processors available on this machine
 INFO  13:31:44,528 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
 INFO  13:31:45,093 GenomeAnalysisEngine - Done preparing for traversal
 INFO  13:31:45,093 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
 INFO  13:31:45,094 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining
 INFO  13:31:45,094 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime
 INFO  13:31:45,097 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
 INFO  13:31:45,097 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output
 WARN  13:31:45,278 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
 INFO  13:31:45,425 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
 INFO  13:31:45,427 PairHMM - Performance profiling for PairHMM is disabled because the program is being run with multiple threads (-nct>1) option
 Profiling is enabled only when running in single thread mode

 Using AVX accelerated implementation of PairHMM
 INFO  13:31:50,403 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
 INFO  13:31:50,403 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
 ##### ERROR --
 ##### ERROR stack trace
 java.lang.NullPointerException
\011at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.removeAltAllelesIfTooManyGenotypes(HaplotypeCallerGenotypingEngine.java:360)
 \011at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:267)
 \011at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:962)
 \011at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:250)
 \011at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
 \011at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
 \011at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
 \011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 \011at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 \011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 \011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 \011at java.lang.Thread.run(Thread.java:745)
 ##### ERROR ------------------------------------------------------------------------------------------
 ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-g56f2c1a):
 ##### ERROR
 ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
 ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
 ##### ERROR Visit our website and forum for extensive documentation and answers to
 ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
 ##### ERROR
 ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
 ##### ERROR ------------------------------------------------------------------------------------------

Viewing all articles
Browse latest Browse all 12345

Trending Articles