Hello there,
I am trying to apply some sample-level filters on a VCF generated using GATK4.0.2.1. My issue is that all variant sites are not getting an FT flag added and I am wondering why. Additionally, "PASS" is being added the the FILTER column at the variant-level (I am not sure if this behavior is expected, but it seems weird)
Here is some information about the system:
17:43:04.589 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/szs315/libgkl_compression8694733123384787175.so
17:43:04.681 INFO VariantFiltration - ------------------------------------------------------------
17:43:04.681 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.0.2.1
17:43:04.681 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
17:43:04.681 INFO VariantFiltration - Executing as szs315@quser12 on Linux v3.10.0-514.36.5.el7.x86_64 amd64
17:43:04.681 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_112-b16
17:43:04.682 INFO VariantFiltration - Start Date/Time: March 11, 2018 6:43:04 PM CDT
17:43:04.682 INFO VariantFiltration - ------------------------------------------------------------
17:43:04.682 INFO VariantFiltration - ------------------------------------------------------------
17:43:04.682 INFO VariantFiltration - HTSJDK Version: 2.14.3
17:43:04.682 INFO VariantFiltration - Picard Version: 2.17.2
17:43:04.684 INFO VariantFiltration - HTSJDK Defaults.BUFFER_SIZE : 131072
17:43:04.684 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 1
17:43:04.684 INFO VariantFiltration - HTSJDK Defaults.CREATE_INDEX : false
17:43:04.684 INFO VariantFiltration - HTSJDK Defaults.CREATE_MD5 : false
17:43:04.684 INFO VariantFiltration - HTSJDK Defaults.CUSTOM_READER_FACTORY :
17:43:04.684 INFO VariantFiltration - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
17:43:04.685 INFO VariantFiltration - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
17:43:04.685 INFO VariantFiltration - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
17:43:04.685 INFO VariantFiltration - HTSJDK Defaults.REFERENCE_FASTA : null
17:43:04.685 INFO VariantFiltration - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
17:43:04.685 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:43:04.685 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:43:04.685 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:43:04.685 INFO VariantFiltration - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
17:43:04.685 DEBUG ConfigFactory - Configuration file values:
17:43:04.688 DEBUG ConfigFactory - gcsMaxRetries = 20
17:43:04.688 DEBUG ConfigFactory - gatk_stacktrace_on_user_exception = false
17:43:04.688 DEBUG ConfigFactory - samjdk.use_async_io_read_samtools = false
17:43:04.688 DEBUG ConfigFactory - samjdk.compression_level = 1
17:43:04.688 DEBUG ConfigFactory - samjdk.use_async_io_write_samtools = true
17:43:04.688 DEBUG ConfigFactory - samjdk.use_async_io_write_tribble = false
17:43:04.688 DEBUG ConfigFactory - spark.kryoserializer.buffer.max = 512m
17:43:04.688 DEBUG ConfigFactory - spark.driver.maxResultSize = 0
17:43:04.688 DEBUG ConfigFactory - spark.driver.userClassPathFirst = true
17:43:04.688 DEBUG ConfigFactory - spark.io.compression.codec = lzf
17:43:04.688 DEBUG ConfigFactory - spark.yarn.executor.memoryOverhead = 600
17:43:04.689 DEBUG ConfigFactory - spark.driver.extraJavaOptions =
17:43:04.689 DEBUG ConfigFactory - spark.executor.extraJavaOptions =
17:43:04.689 DEBUG ConfigFactory - codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
17:43:04.689 DEBUG ConfigFactory - cloudPrefetchBuffer = 40
17:43:04.689 DEBUG ConfigFactory - cloudIndexPrefetchBuffer = -1
17:43:04.689 DEBUG ConfigFactory - createOutputBamIndex = true
17:43:04.689 INFO VariantFiltration - Deflater: IntelDeflater
17:43:04.689 INFO VariantFiltration - Inflater: IntelInflater
17:43:04.689 INFO VariantFiltration - GCS max retries/reopens: 20
17:43:04.689 INFO VariantFiltration - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
17:43:04.689 INFO VariantFiltration - Initializing engine
Here is the command I used to apply the filters
gatk-launch VariantFiltration \
-variant wild_isolate.vcf.gz \
--genotype-filter-expression "DP < 2" \
--genotype-filter-name "depth" \
-O wi_dp_tet.vcf \
--verbosity DEBUG \
--seconds-between-progress-updates 0.1 \
--disable-tool-default-read-filters true \
--lenient true \
--disable-sequence-dictionary-validation true \
--disable-bam-index-caching true
I added the --verbosity flag and all other flags below --verbosity after I noticed some variants were not receiving the FT field. I thought there may be some default filters being applied that may results in variants being skipped (maybe these flags need to be applied at previous steps?). I ran this step with and without those flags, and with/without the -R flag.
I am running this on a test data set to make sure my pipeline is working properly... 45576 variants are not receiving the FT field and 127762 variants did receive the FT field. Also, not that I am not going through the VQSR procedure because I do not have a truth set.
As for the steps proceeding VariantFiltration, I ran HaplotypeCaller in DISCOVERY with ERC GVCF (in chromosome blocks), performed ValidateVariants, combined chromosome gVCFs for each each sample using CombineGVCFs, combined individual sample gVCFs with GenomicsDBImport, and then ran GenotypeGVCFs on individual chromosomes, and collapsed the chromosome VCFs using GatherVcfs.
Here are the last few entries of test VCF, highlighting the inconsistent FORMAT/FT field.
MtDNA 12998 . C A,T 2457.39 PASS AC=8,6;AF=0.571,0.429;AN=14;AS_QD=15.04,31.74;DP=74;ExcessHet=3.0103;FS=0.000;GQ_MEAN=31.14;GQ_STDDEV=28.46;MLEAC=8,6;MLEAF=0.571,0.429;MQ=59.59;NCC=1;QD=33.66;SOR=0.720 GT:AD:DP:GQ:PL 1/1:0,2,0:2:6:80,6,0,80,6,80 2/2:0,0,2:2:6:83,83,83,6,6,0 1/1:0,3,0:3:9:125,9,0,125,9,125 ./.:1,0,0:1:.:0,0,0,0,0,0 1/1:0,22,0:22:66:817,66,0,817,66,817 1/1:0,8,0:8:24:235,24,0,235,24,235 2/2:0,0,11:11:33:383,383,383,33,33,0 2/2:0,0,25:25:74:749,749,749,74,74,0
MtDNA 13029 . T C 74.63 PASS AC=2;AF=0.125;AN=16;AS_QD=32.99;DP=62;ExcessHet=0.1472;FS=0.000;GQ_MEAN=22.13;GQ_STDDEV=20.47;MLEAC=1;MLEAF=0.063;MQ=60.00;NCC=0;QD=26.41;SOR=0.693 GT:AD:DP:FT:GQ:PL 1/1:0,2:2:PASS:6:90,6,0 0/0:1,0:1:depth:3:0,3,34 0/0:5,0:5:PASS:15:0,15,195 0/0:1,0:1:depth:3:0,3,32 0/0:18,0:18:PASS:48:0,48,720 0/0:7,0:7:PASS:21:0,21,213 0/0:8,0:8:PASS:24:0,24,288 0/0:20,0:20:PASS:57:0,57,855
MtDNA 13069 . T C 2144.05 PASS AC=12;AF=1.00;AN=12;AS_QD=27.59;DP=51;ExcessHet=3.0103;FS=0.000;GQ_MEAN=25.50;GQ_STDDEV=13.52;MLEAC=14;MLEAF=1.00;MQ=60.00;NCC=2;QD=30.55;SOR=0.994 GT:AD:DP:GQ:PL 1/1:0,2:2:6:87,6,0 ./.:0,0:0:.:0,0,0 1/1:0,7:7:21:292,21,0 ./.:0,0:0:.:0,0,0 1/1:0,12:12:36:531,36,0 1/1:0,7:7:21:259,21,0 1/1:0,8:8:24:334,24,0 1/1:0,15:15:45:620,45,0
MtDNA 13208 . C T 788.24 PASS AC=6;AF=0.500;AN=12;AS_QD=25.73;DP=53;ExcessHet=0.1809;FS=0.000;GQ_MEAN=20.00;GQ_STDDEV=19.22;MLEAC=8;MLEAF=0.667;MQ=60.00;NCC=2;QD=28.92;SOR=1.127 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 0/0:2,0:2:6:0,6,65 1/1:0,4:4:12:157,12,0 ./.:0,0:0:.:0,0,0 1/1:0,8:8:24:341,24,0 1/1:0,8:8:24:303,24,0 0/0:13,0:13:0:0,0,353 0/0:18,0:18:54:0,54,472
MtDNA 13344 . G A 226.02 PASS AC=2;AF=0.200;AN=10;AS_QD=28.25;DP=17;ExcessHet=0.2482;FS=0.000;GQ_MEAN=9.60;GQ_STDDEV=8.85;MLEAC=3;MLEAF=0.300;MQ=60.00;NCC=3;QD=28.25;SOR=1.179 GT:AD:DP:FT:GQ:PL 0/0:1,0:1:depth:3:0,3,39 ./.:0,0:0:PASS:.:0,0,0 ./.:0,0:0:PASS:.:0,0,0 ./.:0,0:0:PASS:.:0,0,0 0/0:2,0:2:PASS:3:0,3,45 0/0:4,0:4:PASS:12:0,12,136 0/0:2,0:2:PASS:6:0,6,88 1/1:0,8:8:PASS:24:239,24,0
MtDNA 13700 . TA T 49.17 PASS AC=2;AF=0.250;AN=8;AS_QD=24.58;DP=24;ExcessHet=0.3218;FS=0.000;GQ_MEAN=17.25;GQ_STDDEV=7.89;MLEAC=2;MLEAF=0.250;MQ=48.99;NCC=4;QD=24.58;RPA=8,7;RU=A;SOR=2.303;STR GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:1,0:1:.:0,0,0 ./.:0,0:0:.:0,0,0 0/0:7,0:7:21:0,21,298 0/0:6,0:6:18:0,18,141 1/1:0,2:2:6:61,6,0 0/0:8,0:8:24:0,24,211
Any and all helps is appreciated! I'm hoping it is something simple!
Thanks