Hello GATK Team ! @Sheila @Geraldine_VdAuwera
Since my last question ( here ), I am trying to build a workflow which can process all my samples with a snakemake workflow. From the previous question testing on samples, I corrected the $line and also more variants "passed" the filtering step by realigning giving the Read Group. I hope that this post can help any person having trouble to run MuTect2 at first. Please correct me if there's anything wrong, I will describe in bullets the steps so far:
- Align using BWA MEM using FastQ reads --> output in SAM
- SAM to BAM --> using samtools
- Sort the BAM --> using Picard
- Remove duplicates from BAM --> using Picard
- Add Read Group to BAM --> using Picard
- Base Recalibration of BAM --> using BaseRecalibrator and PrintReads
- Indexing BAM --> using samtools
- Building a PON for BAM --> using MuTect2 in artifact_detection_mode on the normal samples
- Merge all the PON --> using CombineVariants
- Run MuTect2 --> Produce raw VCF files with variants that are "PASS"
Following the GATK Best-Practices workflow for Exome Datasets using MuTect2, I am at the last step of variant filtering except if I omitted a step or did a mistake. Following this thread, I assume that there are no further processing but to extract variants that have "PASS" in the VCF files.
I was reading a bit more on the forum when I stumbled upon the VariantRecalibrator for Haplotype Caller (HC). As I understood that this tool assign a new score called VQSLOD which is supposed to be a well-calibrated score that comes from training sets. So I was thinking:
- is there's anyway of calling this VQSLOD on the VCF output from MuTect2 ?
- Would this produce a more "confident" result to interpret during the analysis of the VCF files of MuTect2 ?
- Are you planning to include this option in GATK 4.0 ?
I know the HC is meant for SNPs and indels but being able to produce a VQSLOD for variants on MuTect2 by training sets would supposedly have a more confident result in term of specificity and sensitivity. The only way (as far as I have read) to produce the VQSLOD is by running first the HC on the samples using -ERC GVCF for each sample and then use GenotypeGVCFs on all GVCF files together. Also, I read that the argument for MuTect2 to produce the output in GVCF is not available.
Thank you for your time and effort !
Best regards,
Alaa