Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

GATK4 - VariantFiltration --genotype-filter-expression

$
0
0

Hello there,
I am trying to apply some sample-level filters on a VCF generated using GATK4.0.2.1. My issue is that all variant sites are not getting an FT flag added and I am wondering why. Additionally, "PASS" is being added the the FILTER column at the variant-level (I am not sure if this behavior is expected, but it seems weird)

Here is some information about the system:

17:43:04.589 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/szs315/libgkl_compression8694733123384787175.so
17:43:04.681 INFO  VariantFiltration - ------------------------------------------------------------
17:43:04.681 INFO  VariantFiltration - The Genome Analysis Toolkit (GATK) v4.0.2.1
17:43:04.681 INFO  VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
17:43:04.681 INFO  VariantFiltration - Executing as szs315@quser12 on Linux v3.10.0-514.36.5.el7.x86_64 amd64
17:43:04.681 INFO  VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_112-b16
17:43:04.682 INFO  VariantFiltration - Start Date/Time: March 11, 2018 6:43:04 PM CDT
17:43:04.682 INFO  VariantFiltration - ------------------------------------------------------------
17:43:04.682 INFO  VariantFiltration - ------------------------------------------------------------
17:43:04.682 INFO  VariantFiltration - HTSJDK Version: 2.14.3
17:43:04.682 INFO  VariantFiltration - Picard Version: 2.17.2
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.BUFFER_SIZE : 131072
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 1
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.CREATE_INDEX : false
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.CREATE_MD5 : false
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.CUSTOM_READER_FACTORY : 
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.REFERENCE_FASTA : null
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
17:43:04.685 DEBUG ConfigFactory - Configuration file values: 
17:43:04.688 DEBUG ConfigFactory -  gcsMaxRetries = 20
17:43:04.688 DEBUG ConfigFactory -  gatk_stacktrace_on_user_exception = false
17:43:04.688 DEBUG ConfigFactory -  samjdk.use_async_io_read_samtools = false
17:43:04.688 DEBUG ConfigFactory -  samjdk.compression_level = 1
17:43:04.688 DEBUG ConfigFactory -  samjdk.use_async_io_write_samtools = true
17:43:04.688 DEBUG ConfigFactory -  samjdk.use_async_io_write_tribble = false
17:43:04.688 DEBUG ConfigFactory -  spark.kryoserializer.buffer.max = 512m
17:43:04.688 DEBUG ConfigFactory -  spark.driver.maxResultSize = 0
17:43:04.688 DEBUG ConfigFactory -  spark.driver.userClassPathFirst = true
17:43:04.688 DEBUG ConfigFactory -  spark.io.compression.codec = lzf
17:43:04.688 DEBUG ConfigFactory -  spark.yarn.executor.memoryOverhead = 600
17:43:04.689 DEBUG ConfigFactory -  spark.driver.extraJavaOptions = 
17:43:04.689 DEBUG ConfigFactory -  spark.executor.extraJavaOptions = 
17:43:04.689 DEBUG ConfigFactory -  codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
17:43:04.689 DEBUG ConfigFactory -  cloudPrefetchBuffer = 40
17:43:04.689 DEBUG ConfigFactory -  cloudIndexPrefetchBuffer = -1
17:43:04.689 DEBUG ConfigFactory -  createOutputBamIndex = true
17:43:04.689 INFO  VariantFiltration - Deflater: IntelDeflater
17:43:04.689 INFO  VariantFiltration - Inflater: IntelInflater
17:43:04.689 INFO  VariantFiltration - GCS max retries/reopens: 20
17:43:04.689 INFO  VariantFiltration - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
17:43:04.689 INFO  VariantFiltration - Initializing engine

Here is the command I used to apply the filters

 gatk-launch VariantFiltration \
-variant wild_isolate.vcf.gz \
--genotype-filter-expression "DP < 2" \
--genotype-filter-name "depth" \
-O wi_dp_tet.vcf  \
--verbosity DEBUG \
--seconds-between-progress-updates 0.1 \
--disable-tool-default-read-filters true \
--lenient true \
--disable-sequence-dictionary-validation true \
--disable-bam-index-caching true

I added the --verbosity flag and all other flags below --verbosity after I noticed some variants were not receiving the FT field. I thought there may be some default filters being applied that may results in variants being skipped (maybe these flags need to be applied at previous steps?). I ran this step with and without those flags, and with/without the -R flag.

I am running this on a test data set to make sure my pipeline is working properly... 45576 variants are not receiving the FT field and 127762 variants did receive the FT field. Also, not that I am not going through the VQSR procedure because I do not have a truth set.

As for the steps proceeding VariantFiltration, I ran HaplotypeCaller in DISCOVERY with ERC GVCF (in chromosome blocks), performed ValidateVariants, combined chromosome gVCFs for each each sample using CombineGVCFs, combined individual sample gVCFs with GenomicsDBImport, and then ran GenotypeGVCFs on individual chromosomes, and collapsed the chromosome VCFs using GatherVcfs.

Here are the last few entries of test VCF, highlighting the inconsistent FORMAT/FT field.

MtDNA   12998   .   C   A,T 2457.39 PASS    AC=8,6;AF=0.571,0.429;AN=14;AS_QD=15.04,31.74;DP=74;ExcessHet=3.0103;FS=0.000;GQ_MEAN=31.14;GQ_STDDEV=28.46;MLEAC=8,6;MLEAF=0.571,0.429;MQ=59.59;NCC=1;QD=33.66;SOR=0.720   GT:AD:DP:GQ:PL  1/1:0,2,0:2:6:80,6,0,80,6,80    2/2:0,0,2:2:6:83,83,83,6,6,0    1/1:0,3,0:3:9:125,9,0,125,9,125 ./.:1,0,0:1:.:0,0,0,0,0,0   1/1:0,22,0:22:66:817,66,0,817,66,817    1/1:0,8,0:8:24:235,24,0,235,24,235  2/2:0,0,11:11:33:383,383,383,33,33,0    2/2:0,0,25:25:74:749,749,749,74,74,0
MtDNA   13029   .   T   C   74.63   PASS    AC=2;AF=0.125;AN=16;AS_QD=32.99;DP=62;ExcessHet=0.1472;FS=0.000;GQ_MEAN=22.13;GQ_STDDEV=20.47;MLEAC=1;MLEAF=0.063;MQ=60.00;NCC=0;QD=26.41;SOR=0.693 GT:AD:DP:FT:GQ:PL   1/1:0,2:2:PASS:6:90,6,0 0/0:1,0:1:depth:3:0,3,34    0/0:5,0:5:PASS:15:0,15,195  0/0:1,0:1:depth:3:0,3,32    0/0:18,0:18:PASS:48:0,48,720    0/0:7,0:7:PASS:21:0,21,213  0/0:8,0:8:PASS:24:0,24,288  0/0:20,0:20:PASS:57:0,57,855
MtDNA   13069   .   T   C   2144.05 PASS    AC=12;AF=1.00;AN=12;AS_QD=27.59;DP=51;ExcessHet=3.0103;FS=0.000;GQ_MEAN=25.50;GQ_STDDEV=13.52;MLEAC=14;MLEAF=1.00;MQ=60.00;NCC=2;QD=30.55;SOR=0.994 GT:AD:DP:GQ:PL  1/1:0,2:2:6:87,6,0  ./.:0,0:0:.:0,0,0   1/1:0,7:7:21:292,21,0   ./.:0,0:0:.:0,0,0   1/1:0,12:12:36:531,36,0 1/1:0,7:7:21:259,21,0   1/1:0,8:8:24:334,24,0   1/1:0,15:15:45:620,45,0
MtDNA   13208   .   C   T   788.24  PASS    AC=6;AF=0.500;AN=12;AS_QD=25.73;DP=53;ExcessHet=0.1809;FS=0.000;GQ_MEAN=20.00;GQ_STDDEV=19.22;MLEAC=8;MLEAF=0.667;MQ=60.00;NCC=2;QD=28.92;SOR=1.127 GT:AD:DP:GQ:PL  ./.:0,0:0:.:0,0,0   0/0:2,0:2:6:0,6,65  1/1:0,4:4:12:157,12,0   ./.:0,0:0:.:0,0,0   1/1:0,8:8:24:341,24,0   1/1:0,8:8:24:303,24,0   0/0:13,0:13:0:0,0,353   0/0:18,0:18:54:0,54,472
MtDNA   13344   .   G   A   226.02  PASS    AC=2;AF=0.200;AN=10;AS_QD=28.25;DP=17;ExcessHet=0.2482;FS=0.000;GQ_MEAN=9.60;GQ_STDDEV=8.85;MLEAC=3;MLEAF=0.300;MQ=60.00;NCC=3;QD=28.25;SOR=1.179   GT:AD:DP:FT:GQ:PL   0/0:1,0:1:depth:3:0,3,39    ./.:0,0:0:PASS:.:0,0,0  ./.:0,0:0:PASS:.:0,0,0  ./.:0,0:0:PASS:.:0,0,0  0/0:2,0:2:PASS:3:0,3,45 0/0:4,0:4:PASS:12:0,12,136  0/0:2,0:2:PASS:6:0,6,88 1/1:0,8:8:PASS:24:239,24,0
MtDNA   13700   .   TA  T   49.17   PASS    AC=2;AF=0.250;AN=8;AS_QD=24.58;DP=24;ExcessHet=0.3218;FS=0.000;GQ_MEAN=17.25;GQ_STDDEV=7.89;MLEAC=2;MLEAF=0.250;MQ=48.99;NCC=4;QD=24.58;RPA=8,7;RU=A;SOR=2.303;STR  GT:AD:DP:GQ:PL  ./.:0,0:0:.:0,0,0   ./.:0,0:0:.:0,0,0   ./.:1,0:1:.:0,0,0   ./.:0,0:0:.:0,0,0   0/0:7,0:7:21:0,21,298   0/0:6,0:6:18:0,18,141   1/1:0,2:2:6:61,6,0  0/0:8,0:8:24:0,24,211

Any and all helps is appreciated! I'm hoping it is something simple!

Thanks


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>