Hi
I'm trying to combine a bunch of gvcf's generated by bcbio-nextgen with GATK.
However, when running the command I get the following error:
INFO 09:32:20,508 GenomeAnalysisEngine - Preparing for traversal
INFO 09:32:20,519 GenomeAnalysisEngine - Done preparing for traversal
INFO 09:32:20,520 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 09:32:20,520 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 09:32:20,521 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
WARN 09:32:21,591 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN 09:32:21,592 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
##### ERROR --
##### ERROR stack trace
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double
at java.lang.Double.compareTo(Double.java:49)
at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:320)
at java.util.ComparableTimSort.sort(ComparableTimSort.java:188)
at java.util.Arrays.sort(Arrays.java:1312)
at java.util.Arrays.sort(Arrays.java:1506)
at java.util.ArrayList.sort(ArrayList.java:1454)
at java.util.Collections.sort(Collections.java:141)
at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:1010)
at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:84)
at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:206)
at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.endPreviousStates(CombineGVCFs.java:366)
at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:254)
at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:116)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:291)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:280)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:279)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.Integer cannot be cast to java.lang.Double
##### ERROR ------------------------------------------------------------------------------------------
I'm aware that gvcf's that went through bcftools posed an issue with the same stack trace in the past, but I've already been able to do several merges, with only some failing.
Any idea on how I could fix this?
Thanks a lot
M
sample vcf header (w/o contigs for short)
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.8-0-ge9d806836,Date="Wed Oct 04 07:09:47 CEST 2017",Epoch=1507093787231,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/home/projects/bcbio_annotation/exomes/HSQExomes/hg38/HSQ_RUN_055/samples_HSQ055-merged/work/align/D1308739/D1308739-sort.bam] showFullBamList=fal
se read_buffer_size=null read_filter=[BadCigar, NotPrimaryAlignment] disable_read_filter=[] intervals=[/home/projects/bcbio_annotation/exomes/HSQExomes/hg38/HSQ_RUN_055/samples_HSQ055-merged/work/gatk-haplotype/chr1/D1308739-chr1_0_16125340-regions.bed] excludeIntervals=null interval_set_rule=INTERSECTION interval_merging=ALL interval_padding
=0 reference_sequence=/home/galaxy/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=LENIENT_VCF_PROCESSING use_jdk_deflater=false use_jdk_inflater=false disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=LINEAR variant_index_parameter=128000 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name=dbsnp source=/home/galaxy/bcbio/genomes/Hsapiens/hg38/variation/dbsnp-150.vcf.gz) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[FisherStrand, MappingQualityRankSumTest, MappingQualityZero, QualByDepth, ReadPosRankSumTest, RMSMappingQuality, BaseQualityRankSumTest, GCContent, HaplotypeScore, HomopolymerRun, DepthPerAlleleBySample, Coverage, ClippingRankSumTest, DepthPerSampleHC, StrandBiasBySample] excludeAnnotation=[ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING phredScaledGlobalReadMismappingRate=45 noFpga=false nativePairHmmThreads=1 useDoublePrecision=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[10, 20, 30, 40, 60, 80] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=false keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##GVCFBlock0-10=minGQ=0(inclusive),maxGQ=10(exclusive)
##GVCFBlock10-20=minGQ=10(inclusive),maxGQ=20(exclusive)
##GVCFBlock20-30=minGQ=20(inclusive),maxGQ=30(exclusive)
##GVCFBlock30-40=minGQ=30(inclusive),maxGQ=40(exclusive)
##GVCFBlock40-60=minGQ=40(inclusive),maxGQ=60(exclusive)
##GVCFBlock60-80=minGQ=60(inclusive),maxGQ=80(exclusive)
##GVCFBlock80-100=minGQ=80(inclusive),maxGQ=100(exclusive)
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=GC,Number=1,Type=Float,Description="GC content around the variant (see docs for window size details)">
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##contig=<ID=chr1,length=248956422>
...
##contig=<ID=HLA-DRB1*16:02:01,length=11005>
##reference=file:///home/galaxy/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa
##bcftools_concatVersion=1.5+htslib-1.5
##bcftools_concatCommand=concat --allow-overlaps -O z --file-list /home/projects/bcbio_annotation/exomes/HSQExomes/hg38/HSQ_RUN_055/samples_HSQ055-merged/work/gatk-haplotype/D1308739-files.list -o /tmp/bcbio/tmpWDKfKz/D1308739.vcf.gz; Date=Wed Oct 4 22:37:37 2017
##bcftools_viewVersion=1.5+htslib-1.5
##bcftools_viewCommand=view -h /home/projects/bcbio_annotation/exomes/HSQExomes/hg38/HSQ_RUN_055/samples_HSQ055-merged/final/D1308739/D1308739-gatk-haplotype.vcf.gz; Date=Wed Oct 18 09:19:18 2017
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT D1308739