Hi,
I have just done a Variant Calling using MuTect2 with 10 samples coming from GDC data portal and their respective matched-normal, with a PoN of 12 samples. I'm just a little confused about the number of variant that I have in my samples :
grep -c PASS WXS_GBM01.vcf
293
grep -c PASS WXS_GBM02.vcf
246
grep -c PASS WXS_GBM03.vcf
181
grep -c PASS WXS_GBM04.vcf
146
grep -c PASS WXS_GBM05.vcf
628
grep -c PASS WXS_GBM06.vcf
112
grep -c PASS WXS_GBM07.vcf
206
grep -c PASS WXS_GBM08.vcf
235
grep -c PASS WXS_GBM09.vcf
37375
grep -c PASS WXS_GBM10.vcf
319
In fact I'm wondering how could the GBM09 samples have 100 times more variants than my other samples ? Is that possible ? I just checked that I used the proper samples and it seems correct ...
And what about the average of about 200-300 variants ? I was expecting a higher number since I used BAM files of about 20GB. My variant calling was done with the following command :
java -Xms4000m -Xmx4000m \
-jar GenomeAnalysisTK.jar \
-T MuTect2 \
-R GRCh38.d1.vd1.fa \
-I:tumor $TUMOR \
-I:normal $MATCH_NORMAL \
-PON $PON_FILE \
--dbsnp dbsnp_b147_hg38.vcf \
--cosmic updated_Cosmic38 \
-o $OUT_PFX.vcf \
-nct 8
Thank you very much !