Hello everyone,
I have recently produced results from MuTect2. This is my command running on default parameters:
java -jar WES/programs/GATK-3.7.jar -T MuTect2 -I:normal WES/new_samples/1N.dedup.srt.coor.bam -I:tumor WES/new_samples/1A.picard.dedup.csort.bam -R WES/reference/withoutSuperContigs/hg19_ref_genome.fa --dbsnp WES/dbsnp/dbsnp_138.hg19.vcf --cosmic WES/cosmic/CosmicCodingMuts_GHCr37.p13.vcf -L WES/AllExonsIntervals/HumanAllExonV6r2/S07604514_Covered.bed --out WES/mutect2/resultForNewSample/1NA.vcf -log WES/mutect2/resultForNewSample/1NA.log
Fast forward to the result of some samples content:
Sample 1NA:
# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT $line
chr1 13557 . G A . alt_allele_in_normal ECNT=1;HCNT=14;MAX_ED=.;MIN_ED=.;NLOD=139.96;TLOD=23.83 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:889,13:0.018:5:8:0.385:31658,477:438:451
chr1 14677 rs201327123 G A . alt_allele_in_normal;germline_risk DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=3.51;TLOD=8.46 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:90,8:0.079:7:1:0.875:3192,255:51:39
chr1 14792 . G A . alt_allele_in_normal;clustered_events;t_lod_fstar ECNT=5;HCNT=8;MAX_ED=223;MIN_ED=23;NLOD=67.30;TLOD=5.16 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:375,9:0.021:4:5:0.444:13126,277:193:182
chr1 14815 . C T . alt_allele_in_normal;clustered_events ECNT=5;HCNT=17;MAX_ED=223;MIN_ED=23;NLOD=72.94;TLOD=7.90 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:458,23:0.042:15:8:0.348:16262,843:234:224
chr1 15015 . G C . alt_allele_in_normal;clustered_events;t_lod_fstar ECNT=5;HCNT=11;MAX_ED=223;MIN_ED=23;NLOD=91.68;TLOD=4.08 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:467,11:0.024:4:7:0.364:16614,385:240:227
chr1 16996 . T C . alt_allele_in_normal ECNT=1;HCNT=2;MAX_ED=.;MIN_ED=.;NLOD=14.76;TLOD=8.89 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:728,62:0.071:28:34:0.452:25702,2179:369:359
Sample 5NA:
# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT $line
chr1 14741 . C A . alt_allele_in_normal ECNT=1;HCNT=9;MAX_ED=.;MIN_ED=.;NLOD=3.52;TLOD=18.82 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:151,15:0.099:7:8:0.533:5399,502:91:60
chr1 14815 . C T . alt_allele_in_normal;clustered_events ECNT=4;HCNT=18;MAX_ED=200;MIN_ED=98;NLOD=49.01;TLOD=14.48 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:388,16:0.028:13:3:0.188:13726,584:198:190
chr1 15000 . G C . alt_allele_in_normal;clustered_events;t_lod_fstar ECNT=4;HCNT=3;MAX_ED=200;MIN_ED=98;NLOD=83.74;TLOD=4.78 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:474,8:0.019:5:3:0.625:16833,273:231:243
chr1 15015 . G C . alt_allele_in_normal;clustered_events ECNT=4;HCNT=7;MAX_ED=200;MIN_ED=98;NLOD=68.29;TLOD=7.17 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:412,7:0.018:4:3:0.571:14567,232:203:209
chr1 664427 . C A . alt_allele_in_normal;clustered_events ECNT=2;HCNT=14;MAX_ED=61;MIN_ED=61;NLOD=61.18;TLOD=7.12 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:431,12:0.032:7:5:0.417:15188,444:225:206
chr1 664488 . G C . alt_allele_in_normal;clustered_events ECNT=2;HCNT=5;MAX_ED=61;MIN_ED=61;NLOD=80.63;TLOD=6.56 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:588,13:0.028:8:5:0.615:21242,473:302:286
Sample 12NA:
# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT $line
chr1 13281 . C G . alt_allele_in_normal ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=25.40;TLOD=10.92 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:336,18:0.059:11:7:0.389:11872,639:165:171
chr1 14792 . G A . alt_allele_in_normal;clustered_events ECNT=3;HCNT=28;MAX_ED=223;MIN_ED=23;NLOD=33.07;TLOD=20.97 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:368,16:0.033:7:9:0.438:12845,559:185:183
chr1 14815 . C T . alt_allele_in_normal;clustered_event s ECNT=3;HCNT=28;MAX_ED=223;MIN_ED=23;NLOD=28.34;TLOD=33.46 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:449,42:0.066:22:20:0.476:15948,1532:235:214
chr1 15015 . G C . alt_allele_in_normal;clustered_events ECNT=3;HCNT=23;MAX_ED=223;MIN_ED=23;NLOD=76.09;TLOD=7.09 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:429,9:0.024:6:3:0.667:15288,310:214:215
chr1 664740 . A G . alt_allele_in_normal ECNT=1;HCNT=2;MAX_ED=.;MIN_ED=.;NLOD=31.83;TLOD=13.56 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:289,13:0.056:6:7:0.538:10396,473:143:146
chr1 665172 . C T . alt_allele_in_normal;t_lod_fstar ECNT=1;HCNT=32;MAX_ED=.;MIN_ED=.;NLOD=28.50;TLOD=4.15 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:241,5:0.015:4:1:0.200:8492,154:115:126
So my questions are:
Is the $line in the output VCF header normal to have ? Are we not supposed to get something like Normal Tumor instead ?
This is the direct output of MuTect2, And the number of total variants for each sample are:
1NA: 4285
5NA: 4267
12NA: 7221
After reading a bit on the forum, I found out that the variants that have only "PASS" in the filter column are only the accept variants as mutated/indels as the other have been filtered out due to different reasons (alt_allele_in_normal, clustered_events, etc...). I simply run an awk command to filter all the lines that have the "PASS" in the filter column and this is what I found:
1NA: 1
5NA: 1
12NA: 1
Logically speaking, this is not normal to have only one variant accepted for each sample. Is there any post-process of the VCF files that needs to be done and I am not aware of ?I would like to have a better explanation of what is HCNT please. I tried to look over the forum if there's any clear explanation given but I could not find any. I saw that in the VCF header it says: "Number of haplotypes that support this variant". Any picture or explanation would be greatly appreciated.
Thank you in advance for your time.
Alaa