Dear team,
I've run GATK4.0.0 using Cromwell (30.2) and WDLs at https://github.com/gatk-workflows/gatk4-data-processing and https://github.com/gatk-workflows/gatk4-germline-snps-indels. I had bwa aligned and deduped BAMs, so I modified "processing-for-variant-discovery-gatk4.wdl" to start from BQSR, but otherwise used the published WDLs with minimal modifications.
The results for the public PrecisionFDA datasets (https://precision.fda.gov/) are interesting. The recall and precision were great for the Truth challenge datasets (HiSeq2500, PCR-free, ~50x), but not for the Consistency challenge datasets (HiSeqX, PCR+, ~30x). In particular for indels from the Consistency challenge datasets, the recall and precision were far worse than GATK3 results available for these datasets: ~92% and ~79% for the Garvan dataset and ~89% and 83% for the HLI dataset after VQSR filtration.
Do these numbers match with what you normally get for PCR+ HiSeq X WGS datasets with depths ~35x? If not, are there any parameters that I need to change?
Also, I think it will be very helpful to the community if the team make your GATK4 results publicly available for these popular public datasets.
Best,
Sangtae