Hi,
It seems like starting with GATK 3.6 (or at least sometime after GATK 3.5), when running GenotypeGVCFs and emitting all bases with --includeNonVariantSites. Non-variant sites are now being emitted with <NON_REF>
as ALT as opposed to ".". When running VariantRecalibrator using the INDEL model, it will now treat these as being symbolic instead of ignoring. Increasing the run time into several hours instead of minutes depending on how many invariant sites you are supposed to have.
Below is a vcf with all sites using the June 26th nightly. I am using the June 26th nightly to fix one fatal error http://gatkforums.broadinstitute.org/gatk/discussion/comment/30982#Comment_30982 but before another fatal error was introduced http://gatkforums.broadinstitute.org/gatk/discussion/comment/31535#
[kurt-cgc@c6220-5 VCF]$ zgrep -v "^#" CONTROLS_PLUS_CRE1.VQSR.ANNOTATED.vcf.gz | cut -f 1-8 | head
1 69091 . A <NON_REF> 16.73 LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=42.57;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1 69092 . T <NON_REF> 16.73 LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1 69093 . G <NON_REF> 16.73 LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=42.57;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1 69094 . G <NON_REF> 16.73 LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=42.57;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1 69095 . T <NON_REF> 16.73 LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1 69096 . G <NON_REF> 16.73 LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1 69097 . A <NON_REF> 16.73 LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1 69098 . C <NON_REF> 16.73 LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=42.57;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1 69099 . T <NON_REF> 17.41 LowQual AC=0;AF=0.00;AN=12;DP=9;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=15;NDA=1;VariantType=SYMBOLIC
1 69100 . G <NON_REF> 17.41 LowQual AC=0;AF=0.00;AN=12;DP=9;FractionInformativeReads=1.00;GC=40.59;MLEAC=0;MLEAF=0.00;NCC=15;NDA=1;VariantType=SYMBOLIC
Below is a vcf with all sites using GATK 3.5.
sunrhel4.cidr.jhmi.edu> zgrep -v "^#" /isilon/sequencing/Seq_Proj/CGC_160418_HMH5JBCXX_CGCDev6B_CGC_SCATTER/CGC_PedTest4/VCF/CONTROLS_PLUS_CGC_PedTest4.VQSR.ANNOTATED.vcf.gz | cut -f 1-8 | head
1 69091 . A . . . AN=8;DP=7;FractionInformativeReads=1.00;GC=42.57;HW=0.0;NCC=16;VariantType=NO_VARIATION
1 69092 . T . . . AN=8;DP=7;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=16;VariantType=NO_VARIATION
1 69093 . G . . . AN=8;DP=7;FractionInformativeReads=1.00;GC=42.57;HW=0.0;NCC=16;VariantType=NO_VARIATION
1 69094 . G . . . AN=8;DP=7;FractionInformativeReads=1.00;GC=42.57;HW=0.0;NCC=16;VariantType=NO_VARIATION
1 69095 . T . . . AN=8;DP=7;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=16;VariantType=NO_VARIATION
1 69096 . G . . . AN=8;DP=7;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=16;VariantType=NO_VARIATION
1 69097 . A . . . AN=8;DP=7;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=16;VariantType=NO_VARIATION
1 69098 . C . . . AN=8;DP=7;FractionInformativeReads=1.00;GC=42.57;HW=0.0;NCC=16;VariantType=NO_VARIATION
1 69099 . T . . . AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=15;VariantType=NO_VARIATION
1 69100 . G . . . AN=10;DP=8;FractionInformativeReads=1.00;GC=40.59;HW=0.0;NCC=15;VariantType=NO_VARIATION
Below is an example command line for how I am running GenotypeGVCFs (for running GATK 3.5 , I would used a java 1.7 version).
$JAVA_1_8/java -jar $GATK_DIR/GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R $REF_GENOME \
--dbsnp $DBSNP \
--annotateNDA \
--includeNonVariantSites \
--disable_auto_index_creation_and_locking_when_reading_rods \
--standard_min_confidence_threshold_for_calling 30 \
--standard_min_confidence_threshold_for_emitting 0 \
--annotation AS_BaseQualityRankSumTest \
--annotation AS_FisherStrand \
--annotation AS_InbreedingCoeff \
--annotation AS_MappingQualityRankSumTest \
--annotation AS_RMSMappingQuality \
--annotation AS_ReadPosRankSumTest \
--annotation AS_StrandOddsRatio \
--annotation FractionInformativeReads \
--annotation StrandBiasBySample \
--annotation StrandAlleleCountsBySample \
--annotation LikelihoodRankSumTest \
-L $CHROMOSOME \
--variant $CONTROL_REPO/CGC_CONTROL_SET_3_6.vcf.gz \
--variant $CORE_PATH/$PROJECT/$FAMILY/$FAMILY".gvcf.list" \
-o $CORE_PATH/$PROJECT/TEMP/CONTROLS_PLUS_$FAMILY".RAW."$CHROMOSOME".vcf"
also the gvcf files are being created with -ERC BP_RESOLUTION.