Hi,
I'm currently analysing a data set of six pools, 25 individuals in each (ploidy 50), of a non-model organism. I initially ran HaplotypeCaller with parameter -ERC GVCF and then attempted to do joint genotyping on these files. The resulting vcf was empty however, so in order to troubleshoot, I set up tests on a subset of data (700k sequences, 200k bp contig), which runs mapping, followed by "regular" genotyping with HaplotypeCaller, and compared to running GenotypeGVCFs on output from HaplotypeCaller run with option -ERC GVCF. In addition, I varied the ploidy for each call in HaplotypeCaller, ranging from 10 to 50. I ran bcftools stats on the resulting vcfs and got the following results in terms of number of SNPs:
testout.sort.rg.10.hc.vcf.stats:SN 0 number of SNPs: 485
testout.sort.rg.10.hcnorm.vcf.stats:SN 0 number of SNPs: 494
testout.sort.rg.20.hc.vcf.stats:SN 0 number of SNPs: 0
testout.sort.rg.20.hcnorm.vcf.stats:SN 0 number of SNPs: 502
testout.sort.rg.30.hc.vcf.stats:SN 0 number of SNPs: 0
testout.sort.rg.30.hcnorm.vcf.stats:SN 0 number of SNPs: 508
testout.sort.rg.40.hc.vcf.stats:SN 0 number of SNPs: 0
testout.sort.rg.40.hcnorm.vcf.stats:SN 0 number of SNPs: 504
testout.sort.rg.50.hc.vcf.stats:SN 0 number of SNPs: 0
testout.sort.rg.50.hcnorm.vcf.stats:SN 0 number of SNPs: 503
Here, 'hcnorm' refers to HaplotypCaller in normal mode (no -ERC GVCF). Evidently, for ploidy>10, no SNPs are output. Are there any known problems running GenotypeGVCFs with high ploidy, or am I missing something evident here?
Cheers,
Per