Hi,
I observed this issue while analyzing a known dataset. Gene names in two files that MutSig uses (gene.covariates.txt and exome_full192.coverage.txt) are not according to HGNC and some of the gene names are still according to old nomenclature. But most of the variant annotation programs such as oncotator or VEP uses HGNC symbols for gene annotation. This discrepancy causes MutSig to not recognize well known oncogenes in the MAF file and ignores them.
For example, KMT2D in Esopghageal Squmaous Carcinoma is frequently mutated, but MutSig covariates file doesn;t have this gene. Instead they have MLL2/MLL4 which are synonyms for KMT2D. This causes mutsig to ignore KMT2D from analysis. I tried to convert gene names in these two files into HGNC symbols, but MutSig doesn't recognize these altered files and the result contains NaN values for expr, reptime and hic columns.
Is there anyway to fix this ?
I think this should be addressed, since this is one of the most widely used program and someone doing a denovo analysis might miss important genes.
P.S; I have also posted this on CGA and CGA forum doesn't seem to be active.