Running the multithreaded command below causes an error, although single threaded works fine.
$ java -jar $GATK -T VariantFiltration -R human_g1k_v37.fasta -o chrom01_subset_biallelic_filtered.vcf --variant chrom02_subset_biallelic.vcf.gz --filterExpression "AF > 0.02" --filterName "MAFfilter" --num_threads 2
Is this a bug, or am I doing something wrong?
If you want to see the VCF data that causes the problem, I can supply it, but it's not exactly open source data so it requires some discretion.
The reference FASTA can be downloaded here, and the index and dictionary files are generated via:
$ samtools faidx human_g1k_v37.fasta
$ java -jar $PICARD CreateSequenceDictionary R=human_g1k_v37.fasta O=human_g1k_v37.dict
The error is pasted in here:
INFO 14:06:35,630 HelpFormatter - ----------------------------------------------------------------------------------
INFO 14:06:35,632 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 14:06:35,632 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 14:06:35,633 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 14:06:35,633 HelpFormatter - [Mon May 01 14:06:35 CEST 2017] Executing on Linux 2.6.32-642.1.1.el6.x86_64 amd64
INFO 14:06:35,633 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26
INFO 14:06:35,636 HelpFormatter - Program Args: -T VariantFiltration -R human_g1k_v37.fasta -o chrom01_subset_biallelic_filtered.vcf --variant chrom02_subset_biallelic.vcf.gz --filterExpression AF > 0.02 --filterName MAFfilter --num_threads 2
INFO 14:06:35,639 HelpFormatter - Executing as olavur@fe1.genomedk.net on Linux 2.6.32-642.1.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26.
INFO 14:06:35,640 HelpFormatter - Date/Time: 2017/05/01 14:06:35
INFO 14:06:35,640 HelpFormatter - ----------------------------------------------------------------------------------
INFO 14:06:35,640 HelpFormatter - ----------------------------------------------------------------------------------
INFO 14:06:35,691 GenomeAnalysisEngine - Strictness is SILENT
INFO 14:06:35,781 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
WARN 14:06:35,839 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 14:06:35,847 MicroScheduler - Running the GATK in parallel mode with 2 total threads, 1 CPU thread(s) for each of 2 data thread(s), of 16 processors available on this machine
INFO 14:06:35,946 GenomeAnalysisEngine - Preparing for traversal
INFO 14:06:35,951 GenomeAnalysisEngine - Done preparing for traversal
INFO 14:06:35,952 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 14:06:35,952 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 14:06:35,952 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtimeERROR --
ERROR stack trace
java.lang.NullPointerException
at java.util.LinkedList.node(LinkedList.java:577)
at java.util.LinkedList.get(LinkedList.java:477)
at org.broadinstitute.gatk.tools.walkers.filters.FiltrationContextWindow.getContext(FiltrationContextWindow.java:66)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:367)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.map(VariantFiltration.java:318)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.map(VariantFiltration.java:99)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------