I am running CombineGVCFs on single sample gvcf files produced by Haplotype caller.
"java.lang.IllegalStateException: Key END found in VariantContext field INFO at NC_002971.4:6584 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default."
See trace at end
This occurred on 4.0.0.0 and I updated to 4.0.2.1 with the same results.
It appears that CombineGVCFs does not like more than one character in the REF field - i.e. indicating a deletion.
I have 4 vcf files they all fail on the first deletion.
If I manually edit the file to remove the deletion it fails on the next deletion.
ValidateVariants does not object to the file.
There is no END Key anywhere in the file.
Extract of vcf and trace below.
thanks
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2686_DSTL_8
NC_002971.4 1896 . C T 2193 . AC=1;AF=1.00;AN=1;DP=61
NC_002971.4 2019 . T C 2226 . AC=1;AF=1.00;AN=1;DP=60
NC_002971.4 3912 . A ACAGAG 3059.97 . AC=1;AF=1.00;AN=1;DP=65
NC_002971.4 3915 . G GC 2885.97 . AC=1;AF=1.00;AN=1;DP=64
NC_002971.4 3917 . A AC 2930.97 . AC=1;AF=1.00;AN=1;DP=64
NC_002971.4 3920 . T C 2984.97 . AC=1;AF=1.00;AN=1;DP=65
NC_002971.4 3940 . A C 2984 . AC=1;AF=1.00;AN=1;DP=68
NC_002971.4 5423 . C T 2342 . AC=1;AF=1.00;AN=1;DP=65
NC_002971.4 6584 . AC A 5057.97 . AC=1;AF=1.00;AN=1;DP=12
NC_002971.4 6660 . GT G 3692.97 . AC=1;AF=1.00;AN=1;DP=11
NC_002971.4 7087 . A G 319 . AC=1;AF=1.00;AN=1;DP=8;
NC_002971.4 7712 . G C 307 . AC=1;AF=1.00;AN=1;DP=8;
NC_002971.4 7974 . C T 1115 . AC=1;AF=1.00;AN=1;DP=28
NC_002971.4 8056 . G GC 1126.97 . AC=1;AF=1.00;AN=1;DP=26
NC_002971.4 8066 . A G 1500 . AC=1;AF=1.00;AN=1;DP=33
NC_002971.4 8072 . GGGAAAACA G 1580.97 . AC=1;AF=1.00;AN
NC_002971.4 8082 . T G 1590 . AC=1;AF=1.00;AN=1;DP=36
$ gatk-launch CombineGVCFs --java-options '-Djava.io.tmpdir=/database' --reference=/bioinformatics/references.2018/GCF_000007765.2/GCF_000007765.2_ASM776v2_genomic.fna --output=34_haplotype_caller/combined_samples.vcf --variant 34_haplotype_caller/2686_DSTL_8.genotypes.vcf
Using GATK jar /users/pao207/miniconda2/envs/sequencing/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Djava.io.tmpdir=/database -jar /users/pao207/miniconda2/envs/sequencing/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar CombineGVCFs --reference=/bioinformatics/references.2018/GCF_000007765.2/GCF_000007765.2_ASM776v2_genomic.fna --output=34_haplotype_caller/combined_samples.vcf --variant 34_haplotype_caller/2686_DSTL_8.genotypes.vcf
16:42:35.207 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/users/pao207/miniconda2/envs/sequencing/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:42:35.394 INFO CombineGVCFs - ------------------------------------------------------------
16:42:35.394 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.0.2.1
16:42:35.395 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
16:42:35.395 INFO CombineGVCFs - Executing as pao207@zeus-portal02 on Linux v2.6.32-358.2.1.el6.x86_64 amd64
16:42:35.395 INFO CombineGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_121-b15
16:42:35.395 INFO CombineGVCFs - Start Date/Time: March 27, 2018 4:42:35 PM BST
16:42:35.395 INFO CombineGVCFs - ------------------------------------------------------------
16:42:35.395 INFO CombineGVCFs - ------------------------------------------------------------
16:42:35.395 INFO CombineGVCFs - HTSJDK Version: 2.14.3
16:42:35.395 INFO CombineGVCFs - Picard Version: 2.17.2
16:42:35.396 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 1
16:42:35.396 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:42:35.396 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:42:35.396 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:42:35.396 INFO CombineGVCFs - Deflater: IntelDeflater
16:42:35.396 INFO CombineGVCFs - Inflater: IntelInflater
16:42:35.396 INFO CombineGVCFs - GCS max retries/reopens: 20
16:42:35.396 INFO CombineGVCFs - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
16:42:35.396 INFO CombineGVCFs - Initializing engine
16:42:35.950 INFO FeatureManager - Using codec VCFCodec to read file file:///bioinformatics/sequencing/Projects/26/2686/34_haplotype_caller/2686_DSTL_8.genotypes.vcf
16:42:35.975 INFO CombineGVCFs - Done initializing engine
16:42:36.639 INFO ProgressMeter - Starting traversal
16:42:36.640 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:42:36.684 INFO CombineGVCFs - Shutting down engine
[March 27, 2018 4:42:36 PM BST] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=1775239168
java.lang.IllegalStateException: Key END found in VariantContext field INFO at NC_002971.4:6584 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.
at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:173)
at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:112)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:224)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.endPreviousStates(CombineGVCFs.java:345)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.createIntermediateVariants(CombineGVCFs.java:189)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.apply(CombineGVCFs.java:134)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:73)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:118)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:159)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:202)
at org.broadinstitute.hellbender.Main.main(Main.java:288)