Dear GATK Team,
While running GATK ASEReadCounter using 3.4 and 3.7, i am getting errors related to known sites vcf file.
As per article I have done alignment and processed bam file.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0762-6
Here is my command
java -jar GATK/3.4/GenomeAnalysisTK.jar \
-T ASEReadCounter \
-I ERR188021.rg.md.bam \
-R hs37d5.fa \
-sites ALL.phase1_release_v3.20101123.snps_indels_sv.sites.gdid.gdannot.v2.vcf.gz \
-o ERR188021.ASEReadCounter_results_ver2.csv \
-U ALLOW_N_CIGAR_READS
I have downloaded geuvadis genotype data from geuvadis browser and indexed the ALL sites file.
Error:
ERROR MESSAGE: The provided VCF file is malformed at approximately line number 512: The VCF specification does not allow for whitespace in the INFO field. Offending field value was "AA=.;AC=51;AF=0.02;AFR_AF=0.02;ALLELE=A;AMR_AF=0.02;AN=2184;AVGPOST=0.9975;DAF_GLOBAL=.;ERATE=0.0004;EUR_AF=0.04;GENE_TRCOUNT_AFFECTED=1;GENE_TRCOUNT_TOTAL=1;GERP=.;LDAF=0.0238;RSQ=0.9610;SEVERE_GENE=ENSG00000197049;SEVERE_IMPACT=NON_SYNONYMOUS_CODON;SNPSOURCE=LOWCOV;THETA=0.0007;TR_AFFECTED=FULL;VT=SNP;ANNOTATION_CLASS=NON_SYNONYMOUS_CODON,ACTIVE_CHROM,NC_TRANSCRIPT_VARIANT&INTRON_VARIANT;A_A_CHANGE=F/I,.,.;A_A_LENGTH=169,.,.;A_A_POS=118,.,.;CELL=.,GM12878,.;CHROM_STATE=.,11,.;EXON_NUMBER=1/1,.,.;GENE_ID=ENSG00000197049,.,ENSG00000237491;GENE_NAME=AL669831.1,.,RP11-206L10.6;HGVS=c.352N>A,.,n.37+7285N>A;INTRON_NUMBER=.,.,1/2;POLYPHEN=probably damaging:0.982,.,-:-;SIFT=-:-,.,-:-;TR_BIOTYPE=PROTEIN_CODING,.,PROCESSED_TRANSCRIPT;TR_ID=ENST00000358533,.,ENST00000429505;TR_LENGTH=1194,.,441;TR_POS=438,.,.;TR_STRAND=1,.,1", for input source: ALL.phase1_release_v3.20101123.snps_indels_sv.sites.gdid.gdannot.v2.vcf.gz
I tried to remove spaces in INFO field of vcf and ran again the same with no success.
Could you please help me to resolve this issue.
Thanks in Advance
Fazulur Rehaman