Hi team,
I'm trying to replace my in-house identifiers with those from dbSNP. I've done this before with a purely SNP dataset by applying GATK AnnotateVariants, and then using bash to change the column order, and so create a valid vcf file.
My problem is that the dbSNP reference and alternate alleles for insertions are different from those originally generated by HaplotypeCaller.
For example where the original vcf is Ref T and Alt TATA, in dbSNP this becomes Ref - (dash) and Alt ATA. Annotate Variants generates an error because the alleles are different.
One solution is to omit all insertions but this is a waste of a lot of interesting biological data.
I know that dbSNP is not your responsibility but I was wondering if you or anyone else had any solutions to this.
Sincerely,
William Gilks