Howdy. I'm playing with the 7-12 nightly to fix the HashMap iterator issue in http://gatkforums.broadinstitute.org/gatk/discussion/comment/30982#Comment_30982
When running HaplotypeCaller however, I had a new issue crop up that I didn't find on this forum yet (apologies if it has already been addressed).
Below is the stack trace:
ERROR --
ERROR stack trace
java.lang.IllegalArgumentException: Alleles for a VariantContext must contain at least one reference allele: [CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, ]
at htsjdk.variant.variantcontext.VariantContext.makeAlleles(VariantContext.java:1509)
at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:392)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:488)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:306)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:964)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:251)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version nightly-2016-07-12-gaa9ac69):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Alleles for a VariantContext must contain at least one reference allele: [CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTCCCTT, ]
ERROR ------------------------------------------------------------------------------------------
This was the command line for this (the data was processed using this nightly and GATK BP...except that I also used the local realignment section which has been deprecated. Also the aligner was bwa mem -M v 0.7.8 using the GATK 2.8 resource GRCh37 fasta with decoy sequence.
/isilon/sequencing/Kurt/Programs/Java/jdk1.8.0_73/bin/java -jar \
/isilon/sequencing/CIDRSeqSuiteSoftware/gatk/GATK_3/GenomeAnalysisTK-nightly-2016-07-12-gaa9ac69/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R /isilon/sequencing/GATK_resource_bundle/bwa_mem_0.7.5a_ref/human_g1k_v37_decoy.fasta \
--input_file /isilon/sequencing/Seq_Proj/CGC_CONTROL_DATA_SET_3_6/BAM/NA12891_NA12892_90-10.bam \
-L /isilon/sequencing/data/Work/BED/Production_BED_files/ALLBED_BED_File_Agilent_ClinicalExome_S06588914_ALLBed_merged_021015_noCHR.bed \
--emitRefConfidence BP_RESOLUTION \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
--max_alternate_alleles 3 \
--annotation AS_BaseQualityRankSumTest \
--annotation AS_FisherStrand \
--annotation AS_InbreedingCoeff \
--annotation AS_MappingQualityRankSumTest \
--annotation AS_RMSMappingQuality \
--annotation AS_ReadPosRankSumTest \
--annotation AS_StrandOddsRatio \
--annotation FractionInformativeReads \
--annotation StrandBiasBySample \
--annotation StrandAlleleCountsBySample \
--annotation GCContent \
--annotation AlleleBalanceBySample \
--annotation AlleleBalance \
--annotation LikelihoodRankSumTest \
-pairHMM VECTOR_LOGLESS_CACHING \
-o /isilon/sequencing/Seq_Proj/CGC_CONTROL_DATA_SET_3_6/GVCF/NA12891_NA12892_90-10.g.vcf.gz
the input sample is 90/10 mix of NA12891 and NA12892 (exome), but it has also happened for a "regular" sample. This was out of roughly 60 exomes. The bed files comprises roughly 90 Mb.
This may be related. The line directly preceding the error stack trace involved a symbolic allele for a deletion.
WARN 15:28:37,770 HaplotypeCallerGenotypingEngine - location 1:15714831: too many alternative alleles found (43) larger than the maximum requested with -maxAltAlleles (3), the following will be dropped: CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCCCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCTCCCTCCCCCCCTCCCTCCCTT, CCCCTCCCCAGCCCCTGCCCCACCTCCCTCCCTCCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCAACCTCCCTCCCTCCCCCCCTCCCTCCCTCCCTT, C*
, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCCCCCTCCCCCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCCCTCCCCAGTCCCTGTCCCACCTCCCTCCCTCCCTCCCTCCCCCCCTCCCTCCCTT, CCCTCCCTCCTTCCCTTCC... and 32 more.
Another sample exhibited the same profile, below, (even though it crashed in a different place, but directly after this line, I didn't see any other warnings involving a symbolic deletion allele for either sample in the preceding process logs).
WARN 20:42:19,954 HaplotypeCallerGenotypingEngine - location 16:11537347: too many alternative alleles found (43) larger than the maximum requested with -maxAltAlleles (3), the following will be dropped: A*
, AAAAAGGGGGAGAGAGAG, AAAGGGGGAGAG, G, AAAAAAGAGGGAG, AAAGAGGGAG, AAAAAGAGGGAGAG, AAAAAAGGGGGAGAGAGAG, AAAAGGGAGAGAG, AAAAAGGGAGAGAGAGAG, AAGGGGGAGAG, AAAAAGGGAGAG, AAAAAAGGGAGAGAGAGAG, AAGGGAGAG, AAGAGGGAGAGAGAG, AAAGAGGGAGAGAGAG, AGAGGGAGAGAGAG, AAAAGAGGGAGAGAGAG, AAAAGAGGGAGAG, AAAAAGAGGGAGAGAGAG, AGGGGGAGAG, AAAAAGAGGGAG, AGGGAGAGAGAG, AAAAAAGAGGGAGAGAGAG, AAGAGGGAG, AAAAGGGAGAG, AAAGAGGGAGAG, AGGGGGAGAGAGAG, AAAAAAGGGGGAGAG, AAGGGGGAGAGAGAG, AGGGAGAG, AGGGAG, AAAAGAGGGAG, AAAGGGGGAGAGAGAG and 6 more.
The next entry was the same error as the sample above.
Oddly enough, I had a process that sent a non-zero exit status even though it appears to have been completed successfully (it did however display some sort of warning message that there were a lot of warning messages (2360) and it redisplayed the first 10...need to look at that one again to make sure that the non-zero exit status didn't come from somewhere else in the script. Apologies in advance if I screwed something up. I should be sleeping, but wanted to get this out before I get swamped tomorrow.
Best Regards,
Kurt Hetrick