I ran:
gatk-4.0.11.0/gatk SelectVariants -R Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fasta -V variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_filtered.vcf -select '(1.0*vc.getGenotype("P2-E8-ACTGAGCG-CTTAATAG_S152").getAD().1)/(1.0*vc.getGenotype("P2-E8-ACTGAGCG-CTTAATAG_S152").getDP()) > 0.9' -output variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_selected.vcf --exclude-filtered
and get:
A USER ERROR has occurred: Invalid JEXL expression detected for select-0
The exact same filter worked for 90% of my files, but failed on about 10% of them. I then found that if I replace the numerator with a '1', it still fails:
1/(1.0*vc.getGenotype("P2-E8-ACTGAGCG-CTTAATAG_S152").getDP()) > 0.9
and looking at the vcf file, it turns out there are some SNPs from the initial SNP calling that have zero coverage:
I 27036 . G A 15.14 PASS AC=1;AF=1.00;AN=1;FS=0.000;MLEAC=1;MLEAF=1.00;SOR=0.693 GT:AD:DP:GQ:PL 1:0,0:0:45:45,0
I 27063 . G A 60 PASS AC=1;AF=1.00;AN=1;FS=0.000;MLEAC=1;MLEAF=1.00;SOR=0.693 GT:AD:DP:GQ:PL 1:0,0:0:90:90,0
XII 585947 . T A 16.11 PASS AC=1;AF=1.00;AN=1;FS=0.000;MLEAC=1;MLEAF=1.00;SOR=0.693 GT:AD:DP:GQ:PL 1:0,0:0:46:46,0
Thus, I now understand why my filter failed, because of a divide by zero error, but I don't understand how I got these SNPs in the first place. They were called with:
gatk-4.0.11.0/gatk HaplotypeCaller -ploidy 1 -R Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fasta -I bam/P2-E8-ACTGAGCG-CTTAATAG_S152.dedup.realigned.bam --output variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_raw.vcf
gatk-4.0.11.0/gatk VariantFiltration -R Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fasta -V variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_raw.vcf -filter 'QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0 || MQRankSum < -10.5 || ReadPosRankSum < -8.0' -output variants/P2-E8-ACTGAGCG-CTTAATAG_S152_first_pass_filtered.vcf -filter-name "hard_filter"
Any ideas how a SNP can be called with zero coverage in either the REF or the ALT alleles?