I have some gVCF files, and I need to call variants from them. I am able to use HaplotypeCaller successfully, but VariantRecalibrator is giving me error.
java -jar /storage/s1saini/GenomeAnalysisTK.jar -T GenotypeGVCFs -V SSC00003.g.vcf.gz -V SSC00004.g.vcf.gz -V SSC00005.g.vcf.gz -V SSC00006.g.vcf.gz -V SSC01958.g.vcf.gz -V SSC01964.g.vcf.gz -V SSC01965.g.vcf.gz -V SSC01966.g.vcf.gz -V SSC02852.g.vcf.gz -V SSC02854.g.vcf.gz -V SSC02857.g.vcf.gz -V SSC02858.g.vcf.gz -V SSC03070.g.vcf.gz -V SSC03078.g.vcf.gz -V SSC03092.g.vcf.gz -V SSC03093.g.vcf.gz -o jointcalls.vcf -R ref/human_g1k_b37_20.fasta -L 20 -nt 4
INFO 10:24:23,052 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:24:23,055 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 10:24:23,055 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 10:24:23,055 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 10:24:23,055 HelpFormatter - [Tue Apr 04 10:24:23 PDT 2017] Executing on Linux 3.10.0-514.2.2.el7.x86_64 amd64
INFO 10:24:23,055 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_111-b15
INFO 10:24:23,059 HelpFormatter - Program Args: -T GenotypeGVCFs -V SSC00003.g.vcf.gz -V SSC00004.g.vcf.gz -V SSC00005.g.vcf.gz -V SSC00006.g.vcf.gz -V SSC01958.g.vcf.gz -V SSC01964.g.vcf.gz -V SSC01965.g.vcf.gz -V SSC01966.g.vcf.gz -V SSC02852.g.vcf.gz -V SSC02854.g.vcf.gz -V SSC02857.g.vcf.gz -V SSC02858.g.vcf.gz -V SSC03070.g.vcf.gz -V SSC03078.g.vcf.gz -V SSC03092.g.vcf.gz -V SSC03093.g.vcf.gz -o jointcalls.vcf -R ref/human_g1k_b37_20.fasta -L 20 -nt 4
INFO 10:24:23,063 HelpFormatter - Executing as s1saini@snorlax on Linux 3.10.0-514.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15.
INFO 10:24:23,064 HelpFormatter - Date/Time: 2017/04/04 10:24:23
INFO 10:24:23,064 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:24:23,064 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:24:23,114 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:24:23,303 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 10:24:25,741 IntervalUtils - Processing 63025520 bp from intervals
WARN 10:24:25,741 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,742 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,742 IndexDictionaryUtils - Track variant3 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,742 IndexDictionaryUtils - Track variant4 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,742 IndexDictionaryUtils - Track variant5 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,743 IndexDictionaryUtils - Track variant6 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,743 IndexDictionaryUtils - Track variant7 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,743 IndexDictionaryUtils - Track variant8 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,744 IndexDictionaryUtils - Track variant9 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,744 IndexDictionaryUtils - Track variant10 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,744 IndexDictionaryUtils - Track variant11 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,744 IndexDictionaryUtils - Track variant12 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,744 IndexDictionaryUtils - Track variant13 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,745 IndexDictionaryUtils - Track variant14 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,745 IndexDictionaryUtils - Track variant15 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,745 IndexDictionaryUtils - Track variant16 doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 10:24:25,753 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 1 CPU thread(s) for each of 4 data thread(s), of 28 processors available on this machine
INFO 10:24:25,809 GenomeAnalysisEngine - Preparing for traversal
INFO 10:24:25,810 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:24:25,811 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:24:25,811 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:24:25,811 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
WARN 10:24:26,003 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN 10:24:26,004 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO 10:24:26,005 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
WARN 10:24:28,595 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs
WARN 10:24:31,245 ExactAFCalculator - This tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 20: 83250 has 10 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument. Unless the DEBUG logging level is used, this warning message is output just once per run and further warnings are suppressed.
Message from syslogd@snorlax at Apr 4 10:24:45 ...
kernel:do_IRQ: 8.228 No irq handler for vector (irq -1)
INFO 10:24:56,003 ProgressMeter - 20:3126601 0.0 30.0 s 49.9 w 5.0% 10.1 m 9.6 m
INFO 10:25:26,005 ProgressMeter - 20:3535701 0.0 60.0 s 99.5 w 5.6% 17.8 m 16.8 m
INFO 10:25:56,006 ProgressMeter - 20:6041401 3000000.0 90.0 s 30.0 s 9.6% 15.6 m 14.1 m
INFO 10:26:26,008 ProgressMeter - 20:7496301 4000000.0 120.0 s 30.0 s 11.9% 16.8 m 14.8 m
INFO 10:26:56,010 ProgressMeter - 20:11018501 8000000.0 2.5 m 18.0 s 17.5% 14.3 m 11.8 m
INFO 10:27:26,011 ProgressMeter - 20:11547201 8000000.0 3.0 m 22.0 s 18.3% 16.4 m 13.4 m
INFO 10:27:56,012 ProgressMeter - 20:15076001 1.2E7 3.5 m 17.0 s 23.9% 14.6 m 11.1 m
Message from syslogd@snorlax at Apr 4 10:28:14 ...
kernel:do_IRQ: 3.86 No irq handler for vector (irq -1)
INFO 10:28:26,013 ProgressMeter - 20:15629601 1.2E7 4.0 m 20.0 s 24.8% 16.1 m 12.1 m
INFO 10:28:56,014 ProgressMeter - 20:19188001 1.6E7 4.5 m 16.0 s 30.4% 14.8 m 10.3 m
INFO 10:29:26,015 ProgressMeter - 20:19745601 1.6E7 5.0 m 18.0 s 31.3% 16.0 m 11.0 m
INFO 10:29:56,017 ProgressMeter - 20:23238001 2.0E7 5.5 m 16.0 s 36.9% 14.9 m 9.4 m
INFO 10:30:26,018 ProgressMeter - 20:23764301 2.0E7 6.0 m 18.0 s 37.7% 15.9 m 9.9 m
INFO 10:30:56,019 ProgressMeter - 20:29293301 2.6E7 6.5 m 15.0 s 46.5% 14.0 m 7.5 m
INFO 10:31:26,020 ProgressMeter - 20:31020501 2.8E7 7.0 m 15.0 s 49.2% 14.2 m 7.2 m
INFO 10:31:56,021 ProgressMeter - 20:33371001 3.0E7 7.5 m 15.0 s 52.9% 14.2 m 6.7 m
INFO 10:32:26,022 ProgressMeter - 20:34325401 3.2E7 8.0 m 15.0 s 54.5% 14.7 m 6.7 m
INFO 10:32:56,024 ProgressMeter - 20:37383101 3.4E7 8.5 m 15.0 s 59.3% 14.3 m 5.8 m
INFO 10:33:26,025 ProgressMeter - 20:39016401 3.6E7 9.0 m 15.0 s 61.9% 14.5 m 5.5 m
INFO 10:33:56,026 ProgressMeter - 20:41453001 3.8E7 9.5 m 15.0 s 65.8% 14.4 m 4.9 m
INFO 10:34:26,027 ProgressMeter - 20:45001701 4.2E7 10.0 m 14.0 s 71.4% 14.0 m 4.0 m
INFO 10:34:56,029 ProgressMeter - 20:46006401 4.3E7 10.5 m 14.0 s 73.0% 14.4 m 3.9 m
INFO 10:35:26,030 ProgressMeter - 20:49063101 4.6E7 11.0 m 14.0 s 77.8% 14.1 m 3.1 m
INFO 10:35:56,031 ProgressMeter - 20:50020001 4.7E7 11.5 m 14.0 s 79.4% 14.5 m 3.0 m
INFO 10:36:26,032 ProgressMeter - 20:53090001 5.0E7 12.0 m 14.0 s 84.2% 14.2 m 2.2 m
INFO 10:36:56,033 ProgressMeter - 20:54019201 5.1E7 12.5 m 14.0 s 85.7% 14.6 m 2.1 m
INFO 10:37:26,034 ProgressMeter - 20:57112001 5.4E7 13.0 m 14.0 s 90.6% 14.3 m 80.0 s
INFO 10:37:56,036 ProgressMeter - 20:58083301 5.5E7 13.5 m 14.0 s 92.2% 14.6 m 68.0 s
INFO 10:38:26,037 ProgressMeter - 20:61190101 5.8E7 14.0 m 14.0 s 97.1% 14.4 m 25.0 s
INFO 10:38:56,038 ProgressMeter - 20:62134501 5.9E7 14.5 m 14.0 s 98.6% 14.7 m 12.0 s
INFO 10:39:26,039 ProgressMeter - 20:63025501 6.202552E7 15.0 m 14.0 s 100.0% 15.0 m 0.0 s
INFO 10:39:49,084 ProgressMeter - done 6.302552E7 15.4 m 14.0 s 100.0% 15.4 m 0.0 s
INFO 10:39:49,084 ProgressMeter - Total runtime 923.27 secs, 15.39 min, 0.26 hours
------------------------------------------------------------------------------------------
Done. There were 20 WARN messages, the first 10 are repeated below.
WARN 10:24:25,741 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,742 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,742 IndexDictionaryUtils - Track variant3 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,742 IndexDictionaryUtils - Track variant4 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,742 IndexDictionaryUtils - Track variant5 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,743 IndexDictionaryUtils - Track variant6 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,743 IndexDictionaryUtils - Track variant7 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,743 IndexDictionaryUtils - Track variant8 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,744 IndexDictionaryUtils - Track variant9 doesn't have a sequence dictionary built in, skipping dictionary validation
WARN 10:24:25,744 IndexDictionaryUtils - Track variant10 doesn't have a sequence dictionary built in, skipping dictionary validation
java -jar /storage/s1saini/GenomeAnalysisTK.jar -T VariantRecalibrator -R ref/human_g1k_b37_20.fasta -input jointcalls.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf.gz -an DP -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R
INFO 10:40:30,682 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:40:30,684 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 10:40:30,685 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 10:40:30,685 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 10:40:30,685 HelpFormatter - [Tue Apr 04 10:40:30 PDT 2017] Executing on Linux 3.10.0-514.2.2.el7.x86_64 amd64
INFO 10:40:30,685 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_111-b15
INFO 10:40:30,689 HelpFormatter - Program Args: -T VariantRecalibrator -R ref/human_g1k_b37_20.fasta -input jointcalls.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf.gz -an DP -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R
INFO 10:40:30,693 HelpFormatter - Executing as s1saini@snorlax on Linux 3.10.0-514.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15.
INFO 10:40:30,694 HelpFormatter - Date/Time: 2017/04/04 10:40:30
INFO 10:40:30,694 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:40:30,694 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:40:30,718 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:40:30,808 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
WARN 10:40:31,044 IndexDictionaryUtils - Track hapmap doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 10:40:31,165 GenomeAnalysisEngine - Preparing for traversal
INFO 10:40:31,166 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:40:31,167 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:40:31,167 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:40:31,167 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 10:40:31,172 TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
INFO 10:40:35,327 VariantDataManager - DP: mean = 535.96 standard deviation = 65.37
INFO 10:40:35,483 VariantDataManager - Annotations are now ordered by their information content: [DP]
INFO 10:40:35,498 VariantDataManager - Training with 61633 variants after standard deviation thresholding.
INFO 10:40:35,502 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 10:40:36,992 VariantRecalibratorEngine - Finished iteration 0.
INFO 10:40:37,902 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.08556
INFO 10:40:40,563 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.04317
INFO 10:40:42,340 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.02471
INFO 10:40:43,232 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.01472
INFO 10:40:43,805 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.01129
INFO 10:40:44,384 VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.01005
INFO 10:40:44,965 VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.00837
INFO 10:40:45,538 VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.00690
INFO 10:40:46,119 VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.00585
INFO 10:40:46,703 VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.00541
INFO 10:40:47,286 VariantRecalibratorEngine - Finished iteration 55. Current change in mixture coefficients = 0.00555
INFO 10:40:47,866 VariantRecalibratorEngine - Finished iteration 60. Current change in mixture coefficients = 0.00570
INFO 10:40:48,460 VariantRecalibratorEngine - Finished iteration 65. Current change in mixture coefficients = 0.00588
INFO 10:40:49,048 VariantRecalibratorEngine - Finished iteration 70. Current change in mixture coefficients = 0.00611
INFO 10:40:49,640 VariantRecalibratorEngine - Finished iteration 75. Current change in mixture coefficients = 0.00634
INFO 10:40:50,456 VariantRecalibratorEngine - Finished iteration 80. Current change in mixture coefficients = 0.00651
INFO 10:40:51,053 VariantRecalibratorEngine - Finished iteration 85. Current change in mixture coefficients = 0.00651
INFO 10:40:51,651 VariantRecalibratorEngine - Finished iteration 90. Current change in mixture coefficients = 0.00626
INFO 10:40:52,249 VariantRecalibratorEngine - Finished iteration 95. Current change in mixture coefficients = 0.00575
INFO 10:40:52,841 VariantRecalibratorEngine - Finished iteration 100. Current change in mixture coefficients = 0.00508
INFO 10:40:53,434 VariantRecalibratorEngine - Finished iteration 105. Current change in mixture coefficients = 0.00436
INFO 10:40:54,050 VariantRecalibratorEngine - Finished iteration 110. Current change in mixture coefficients = 0.00368
INFO 10:40:54,668 VariantRecalibratorEngine - Finished iteration 115. Current change in mixture coefficients = 0.00308
INFO 10:40:55,282 VariantRecalibratorEngine - Finished iteration 120. Current change in mixture coefficients = 0.00257
INFO 10:40:55,905 VariantRecalibratorEngine - Finished iteration 125. Current change in mixture coefficients = 0.00213
INFO 10:40:56,161 VariantRecalibratorEngine - Convergence after 127 iterations!
INFO 10:40:56,243 VariantRecalibratorEngine - Evaluating full set of 188987 variants...
INFO 10:40:56,546 VariantDataManager - Training with worst 856 scoring variants --> variants with LOD <= -5.0000.
INFO 10:40:56,546 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 10:40:56,551 VariantRecalibratorEngine - Finished iteration 0.
INFO 10:40:56,554 VariantRecalibratorEngine - Convergence after 3 iterations!
INFO 10:40:56,563 VariantRecalibratorEngine - Evaluating full set of 188987 variants...
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --minNumBadVariants 5000, for example).
##### ERROR ------------------------------------------------------------------------------------------
I don't believe this is because of small dataset. I am working with 16 samples, on Chromosome 20.