I am using DiagnoseTargets on whole exon sequence data. Even though in the output .vcf file, it mentioned several filters (such as BAD_MATE, COVERAGE_GAPS, LOW_COVERAGE,NO_READS,PASS,POOR_QUALITY), I only see LOW_COVERAGE,NO_READS and POOR_QUALITY in my .vcf file. not even one single piece showed PASS. I am pretty sure that I downloaded reference genome and CDS file from same release. ftp://ftp.ensembl.org/pub/release-82/fasta/mus_musculus/. In the end 38% of target CDS region are LOW_COVERAGE,NO_READS or POOR_QUALITY.
awk '(!/^ *#/){print $7}' SVZ_coverage.vcf |sort |uniq -c
15729 LOW_COVERAGE;NO_READS
397 LOW_COVERAGE;NO_READS;POOR_QUALITY
3264 NO_READS
810 NO_READS;POOR_QUALITY
May I ask what's the possible reason? Does it mean that 62% of intervals are "PASS"? The program just do not report?
Here is part of the output
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Mouse_SVZ_Pool
1 3214482 . G
. LOW_COVERAGE;NO_READS END=3671497;IDP=0.801;IGC=0.126 FT:IDP:LL:ZL LOW_COVERAGE;NO_READS:0.801:140704:3110191 4290846 . T . LOW_COVERAGE;NO_READS END=4409240;IDP=7.04;IGC=0.141 FT:IDP:LL:ZL LOW_COVERAGE;NO_READS:7.04:36140:74721
The exon sequence was done at 2013 from some company, and the reference CDS I am using is released at 2015. Although the target region at 2013 may differ a bit with the CDS release at 2015, but the amount shouldn't be so high. May I ask do you have some other suggestion? Thanks
Daisy