HaplotypeCaller in GATK 3.7 (3.7-0-g56f2c1a) is throwing a NullPointerException in some cases. See below for log output from a failing run.
It looks to me like the call to .get()
in the practicalAlleleCountForPloidy
HashMap must returning null for some reason (and the unboxing into an int
is then causing the null pointer exception): https://github.com/broadgsa/gatk-protected/blob/master/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCallerGenotypingEngine.java#L360
Given that the immediately preceding call is to practicalAlleleCountForPloidy.putIfAbsent()
, either the key for the given ploidy must already be in the HashMap with value null
or the calculation from GenotypeLikelihoodCalculators.computeMaxAcceptableAlleleCount(ploidy, maxGenotypeCountToEnumerate)
is returning null.
A quick scan of the code does not indicate any obvious problems here. I'll see if I can add some debug printing and re-run on the problematic data to clarify the situation.
['-T', 'HaplotypeCaller', '--no_cmdline_in_header', '-R', u'/keep/d527a0b11143ebf18be6c52ff6c09552+2163/hs37d5.fa', '-I', u'/keep/c5e28ac0e8014f6117792f83e031aea8+21780/20643_7.cram', '-L', u'/keep/85abb468fc85aece80e33396c48fb7d0+94/hs37d5.dict.159_of_200.interval_list', '-A', 'StrandAlleleCountsBySample', '-A', 'StrandBiasBySample', '-nct', '4', '--emitRefConfidence', 'GVCF', '--variant_index_type', 'LINEAR', '--variant_index_parameter', '128000', '-o', u'/tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz', '-l', 'INFO']
INFO 13:31:41,104 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:31:41,110 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-g56f2c1a, Compiled 2017/01/03 11:50:40
INFO 13:31:41,110 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 13:31:41,110 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 13:31:41,111 HelpFormatter - [Tue Jan 03 13:31:41 UTC 2017] Executing on Linux 3.13.0-85-generic amd64
INFO 13:31:41,111 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14
INFO 13:31:41,118 HelpFormatter - Program Args: -T HaplotypeCaller --no_cmdline_in_header -R /keep/d527a0b11143ebf18be6c52ff6c09552+2163/hs37d5.fa -I /keep/c5e28ac0e8014f6117792f83e031aea8+21780/20643_7.cram -L /keep/85abb468fc85aece80e33396c48fb7d0+94/hs37d5.dict.159_of_200.interval_list -A StrandAlleleCountsBySample -A StrandBiasBySample -nct 4 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o /tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz -l INFO
INFO 13:31:41,125 HelpFormatter - Executing as crunch@f1857b5c4c58 on Linux 3.13.0-85-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14.
INFO 13:31:41,126 HelpFormatter - Date/Time: 2017/01/03 13:31:41
INFO 13:31:41,126 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:31:41,126 HelpFormatter - --------------------------------------------------------------------------------
WARN 13:31:41,135 GATKVCFUtils - Naming your output file using the .g.vcf extension will automatically set the appropriate values for --variant_index_type and --variant_index_parameter
WARN 13:31:41,136 GATKVCFUtils - Creating Tabix index for /tmp/crunch-job-task-work/humgen-04-02.8/out/20643_7.hs37d5.dict.159_of_200.interval_list.vcf.gz, ignoring user-specified index type and parameter
INFO 13:31:41,178 GenomeAnalysisEngine - Strictness is SILENT
INFO 13:31:41,910 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
INFO 13:31:41,920 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 13:31:43,684 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1.76
INFO 13:31:44,363 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 13:31:44,401 IntervalUtils - Processing 15618872 bp from intervals
INFO 13:31:44,422 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 4 CPU thread(s) for each of 1 data thread(s), of 40 processors available on this machine
INFO 13:31:44,528 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 13:31:45,093 GenomeAnalysisEngine - Done preparing for traversal
INFO 13:31:45,093 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 13:31:45,094 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 13:31:45,094 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
INFO 13:31:45,097 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
INFO 13:31:45,097 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output
WARN 13:31:45,278 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
INFO 13:31:45,425 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 13:31:45,427 PairHMM - Performance profiling for PairHMM is disabled because the program is being run with multiple threads (-nct>1) option
Profiling is enabled only when running in single thread mode
Using AVX accelerated implementation of PairHMM
INFO 13:31:50,403 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
INFO 13:31:50,403 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
##### ERROR --
##### ERROR stack trace
java.lang.NullPointerException
\011at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.removeAltAllelesIfTooManyGenotypes(HaplotypeCallerGenotypingEngine.java:360)
\011at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:267)
\011at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:962)
\011at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:250)
\011at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
\011at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
\011at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
\011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
\011at java.util.concurrent.FutureTask.run(FutureTask.java:266)
\011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
\011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
\011at java.lang.Thread.run(Thread.java:745)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-g56f2c1a):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------