Initially I was going to ask about why I might be seeing some InbreedingCoeff values less than -1 (based on the calculation as explained at https://software.broadinstitute.org/gatk/documentation/article.php?id=8032, I believe it should always be between -1 and 1). But then I checked a random sample of 10,000 InbreedingCoeff values from my VCF against the values I calculate myself, and I see a strange mismatch generally, not only in the IC values given by GATK as less than -1. Attached is my code and a plot, with the y = x line in blue and y = -1 in red. I see the same pattern in another dataset which was produced by the same GATK-based pipeline.
The VCF referred to in the attached document is generated from 157 exome-capture samples (unrelated individuals) using GATK 3.6 with JDK 1.8.0. We use HaplotypeCaller on each sample, then GenotypeGVCFs on the collected results, then VariantRecalibrator/ApplyRecalibration with the recommended parameters/resources. I can provide the full commands if helpful.
Any ideas about why there is this mismatch? Is there something I'm misunderstanding about the InbreedingCoeff values?
Thanks!