Hi All,
I am using both UnifiedGenotyper and HaplotypeCaller to genotype given alleles from amplicon based sequencing data.
It seems that both caller did not performe well on variant rs1799752, a 50 bp insertion.
Sample was verified by agarose gel method on PCR products, which was considered as gold standard for this variant.
I looked into the sequences from bam files, the insertion sequence is indeed in the bam file, but can not be called by both caller.
Any suggestion?
Command line I used
primer sequences were hard clipped after bwa mapping, calling was performed under given alleles mode
- UG:
java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R /data/iGenomes/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/genome.fa -stand_call_conf 10 -D /data/NGS/analysis/snps.vcf --genotyping_mode GENOTYPE_GIVEN_ALLELES --alleles /data/NGS/analysis/snps.vcf -I /data/NGS/analysis/GWI127_combined.primerclipped.bam -L 17:61565590-61566290 -o GWI127.rs1799752.ug.vcf - HC:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R /data/iGenomes/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/genome.fa -stand_call_conf 10 -D /data/NGS/analysis/snps.vcf --genotyping_mode GENOTYPE_GIVEN_ALLELES --alleles /data/NGS/analysis/snps.vcf -I /data/NGS/analysis/GWI127_combined.primerclipped.bam -L 17:61565590-61566290 -o GWI127.rs1799752.hc.vcf - HC without given alleles:
java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R /data/iGenomes/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/genome.fa -stand_call_conf 10 -D /data/NGS/analysis/snps.vcf -I /data/NGS/analysis/GWI127_combined.primerclipped.bam -L 17:61565590-61566290 -o GWI127.rs1799752.hc.v2.vcf
I also added -activeRegionMaxSize 1000/3000 as suggested in the forum, but it did not work as well.
$samtools view /data/NGS/analysis/bamclipper/GWI127_combined.primerclipped.bam | grep ATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCC
E00491:102:H23N2CCXY:7:1218:18791:48107 97 1 100426255 0 56S94M 17 61565891 0 GAGAGCCACTCCCATCCTTTCTCCCATTTCTCTAGACCTGCTGCCTATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCCGGCTGGAGTGCTGTGGCGGGATCTCGGCTCCCTGCAAGCTCCGCCTCCCGGGT AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<JJJJJJJJJJJJJJJJJJFJFFJJJJJJJJJJJJJJJ<<-<FAAFFJJJF<JA7<FJ<<FFF-77<J7
-77<A<-7F--F7<77FAAF-AA-7---<7-7-AFFAFFAF<AA7<7) NM:i:2 MD:Z:52A18A22 AS:i:84 XS:i:84 RG:Z:GWI127_combined SA:Z:17,61565845,+,60M90S,0,0;
E00491:102:H23N2CCXY:7:2116:24150:44697 99 17 61565813 60 19S92M39S = 61565891 181 AGGTGTCTGCAGCATGTGGCCCCAGGCCGGGGACTCTGTAAGCCACTGCTGGAGAGCCACTCCCATCCTTTCTCCCATTTCTCTAGACCTGCTGCCTATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCAGG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
JJJJJFJJJFJJJJJJJJJJJJJJJJJJ-7--FJJFFF)7-7AJJJ<J-7JJJ<FF AS:i:111 XS:i:24 RG:Z:GWI127_combined SA:Z:2,149045941,-,47M103S,0,0;
E00491:102:H23N2CCXY:7:2116:19136:33902 99 17 61565813 60 19S92M39S = 61565891 181 AGGTGTCTGCAGCATGTGGCCCCAGGCCGGGGACTCTGTAAGCCACTGCTGGAGAGCCACTCCCATCCTTTCTCCCATTTCTCTAGACCTGCTGCCTATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCAGG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJFJJFJJJJJJJJJJJJJJJ
JJJJJJJJJJJJJJFJJJJJJJJJJJJJ-77<FFJ7FJJFFAJJ<)7F<AJJJ-<- AS:i:111 XS:i:24 RG:Z:GWI127_combined SA:Z:6,17717274,+,103S47M,0,0;
E00491:102:H23N2CCXY:7:2124:29011:16727 99 17 61565813 60 19S92M39S = 61565891 181 AGGTGTCTGCAGCATGTGGCCCCAGGCCGGGGACTCTGTAAGCCACTGCTGGAGAGCCACTCCCATCCTTTCTCCCATTTCTCTAGACCTGCTGCCTATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCAGT AAFFFAJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJF<JJJJJJ<JJJJJJFJJFFJAJJJJJJJF7-FJJJJJJFJJJJJJJJJJJFJJJ
JJFA7FJAJFJJJJAJJJJAF<<A<-<FA-A-777<F-F7-)<)7A7JF)7)<))) AS:i:111 XS:i:24 RG:Z:GWI127_combined SA:Z:6,17717274,+,103S47M,0,1;
E00491:102:H23N2CCXY:7:2122:19928:51869 99 17 61565813 60 19S92M39S = 61565891 181 AGGTGTCTGCAGCATGTGGCCCCAGGCCGGGGACTCTGTAAGCCACTGCTGGAGAGCCACTCCCATCCTTTCTCCCATTTTTCTAGACCTGCTGCCTATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCAGG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJFJFJJFJJJJJJJJJJJJJJJJFJJJJJFJ
JJJFJFJFJJFJJJJJFJJJJJJJJJJJ--7-AAFJJJ<FAJJJ7FAFJAF<<)7F AS:i:106 XS:i:24 RG:Z:GWI127_combined SA:Z:6,17717274,+,103S47M,0,0;
E00491:102:H23N2CCXY:7:2114:4827:50568 99 17 61565813 60 19S92M39S = 61565891 181 AGGTGTCTGCAGCATGTGGCCCCAGGCCGGGGACTCTGTAAGCCACTGCTGGAGAGCCACTCCCATCCTTTCTCCCATTTCTCTAGACCTGCTGCCTATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCAGG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJFJJJJJFJJJJJJJJJJJJJJJJFFAJJJJJJJ
JJJFJFJJJFJFJJFJJJJJJJJJJJJF--7-7<-7A7<-<7-)77F7AFAFF)A< AS:i:111 XS:i:24 RG:Z:GWI127_combined SA:Z:6,17717274,+,103S47M,0,0;
...
$samtools view /data/NGS/analysis/bamclipper/GWI127_combined.primerclipped.bam | grep ATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCC| wc -l
21
VCF file:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GWI127_combined
17 61565890 rs1799752 T TATACAGTCACTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCC 0 LowQual AC=0;AF=0.00;AN=2;DB;DP=71;ExcessHet=3.0103;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=59.99;SOR=0.368 GT:AD:DP:GQ:PL:SB 0/0:3,0:3:10:0,10,137:2,1,0,0