Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

GenotypeGVCFs --includeNonVariantSites emits reference as symbolic

$
0
0

Hi,

It seems like starting with GATK 3.6 (or at least sometime after GATK 3.5), when running GenotypeGVCFs and emitting all bases with --includeNonVariantSites. Non-variant sites are now being emitted with <NON_REF> as ALT as opposed to ".". When running VariantRecalibrator using the INDEL model, it will now treat these as being symbolic instead of ignoring. Increasing the run time into several hours instead of minutes depending on how many invariant sites you are supposed to have.

Below is a vcf with all sites using the June 26th nightly. I am using the June 26th nightly to fix one fatal error http://gatkforums.broadinstitute.org/gatk/discussion/comment/30982#Comment_30982 but before another fatal error was introduced http://gatkforums.broadinstitute.org/gatk/discussion/comment/31535#

[kurt-cgc@c6220-5 VCF]$ zgrep -v "^#" CONTROLS_PLUS_CRE1.VQSR.ANNOTATED.vcf.gz | cut -f 1-8 | head
1       69091   .       A       <NON_REF>       16.73   LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=42.57;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1       69092   .       T       <NON_REF>       16.73   LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1       69093   .       G       <NON_REF>       16.73   LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=42.57;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1       69094   .       G       <NON_REF>       16.73   LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=42.57;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1       69095   .       T       <NON_REF>       16.73   LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1       69096   .       G       <NON_REF>       16.73   LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1       69097   .       A       <NON_REF>       16.73   LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1       69098   .       C       <NON_REF>       16.73   LowQual AC=0;AF=0.00;AN=10;DP=8;FractionInformativeReads=1.00;GC=42.57;MLEAC=0;MLEAF=0.00;NCC=16;NDA=1;VariantType=SYMBOLIC
1       69099   .       T       <NON_REF>       17.41   LowQual AC=0;AF=0.00;AN=12;DP=9;FractionInformativeReads=1.00;GC=41.58;MLEAC=0;MLEAF=0.00;NCC=15;NDA=1;VariantType=SYMBOLIC
1       69100   .       G       <NON_REF>       17.41   LowQual AC=0;AF=0.00;AN=12;DP=9;FractionInformativeReads=1.00;GC=40.59;MLEAC=0;MLEAF=0.00;NCC=15;NDA=1;VariantType=SYMBOLIC

Below is a vcf with all sites using GATK 3.5.

sunrhel4.cidr.jhmi.edu> zgrep -v "^#" /isilon/sequencing/Seq_Proj/CGC_160418_HMH5JBCXX_CGCDev6B_CGC_SCATTER/CGC_PedTest4/VCF/CONTROLS_PLUS_CGC_PedTest4.VQSR.ANNOTATED.vcf.gz | cut -f 1-8 | head
1       69091   .       A       .       .       .       AN=8;DP=7;FractionInformativeReads=1.00;GC=42.57;HW=0.0;NCC=16;VariantType=NO_VARIATION
1       69092   .       T       .       .       .       AN=8;DP=7;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=16;VariantType=NO_VARIATION
1       69093   .       G       .       .       .       AN=8;DP=7;FractionInformativeReads=1.00;GC=42.57;HW=0.0;NCC=16;VariantType=NO_VARIATION
1       69094   .       G       .       .       .       AN=8;DP=7;FractionInformativeReads=1.00;GC=42.57;HW=0.0;NCC=16;VariantType=NO_VARIATION
1       69095   .       T       .       .       .       AN=8;DP=7;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=16;VariantType=NO_VARIATION
1       69096   .       G       .       .       .       AN=8;DP=7;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=16;VariantType=NO_VARIATION
1       69097   .       A       .       .       .       AN=8;DP=7;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=16;VariantType=NO_VARIATION
1       69098   .       C       .       .       .       AN=8;DP=7;FractionInformativeReads=1.00;GC=42.57;HW=0.0;NCC=16;VariantType=NO_VARIATION
1       69099   .       T       .       .       .       AN=10;DP=8;FractionInformativeReads=1.00;GC=41.58;HW=0.0;NCC=15;VariantType=NO_VARIATION
1       69100   .       G       .       .       .       AN=10;DP=8;FractionInformativeReads=1.00;GC=40.59;HW=0.0;NCC=15;VariantType=NO_VARIATION

Below is an example command line for how I am running GenotypeGVCFs (for running GATK 3.5 , I would used a java 1.7 version).

$JAVA_1_8/java -jar $GATK_DIR/GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R $REF_GENOME \
--dbsnp $DBSNP \
--annotateNDA \
--includeNonVariantSites \
--disable_auto_index_creation_and_locking_when_reading_rods \
--standard_min_confidence_threshold_for_calling 30 \
--standard_min_confidence_threshold_for_emitting 0 \
--annotation AS_BaseQualityRankSumTest \
--annotation AS_FisherStrand \
--annotation AS_InbreedingCoeff \
--annotation AS_MappingQualityRankSumTest \
--annotation AS_RMSMappingQuality \
--annotation AS_ReadPosRankSumTest \
--annotation AS_StrandOddsRatio \
--annotation FractionInformativeReads \
--annotation StrandBiasBySample \
--annotation StrandAlleleCountsBySample \
--annotation LikelihoodRankSumTest \
-L $CHROMOSOME \
--variant $CONTROL_REPO/CGC_CONTROL_SET_3_6.vcf.gz \
--variant $CORE_PATH/$PROJECT/$FAMILY/$FAMILY".gvcf.list" \
-o $CORE_PATH/$PROJECT/TEMP/CONTROLS_PLUS_$FAMILY".RAW."$CHROMOSOME".vcf"

also the gvcf files are being created with -ERC BP_RESOLUTION.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>