Hi Geraldine, Sheila, and others,
Now that it seems that a fix for the "GGA" feature in HaplotypeCaller may be forthcoming in some future GATK release, I was wondering about the prospects for a -GENOTYPE_GIVEN_ALLELES
option for GenotypeGVCFs. There understandably seems to be a significant amount of interest in this sort of capability, e.g. here, and here.
To be more concrete, I'm thinking about something that would function like the following example:
Given the following as known variants of interest, specified in an -alleles
option:
#CHROM POS ID REF ALT QUAL FILTER INFO
20 10000694 . G A . . .
20 10001661 . T C . . .
...GenotypeGVCFs would take an input gVCF, like the simplified single-sample example below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
20 10000694 . G T,<NON_REF> . . . GT:DP:GQ 0/1:29:99
20 10000695 . G <NON_REF> . . END=10001999 GT:DP:GQ 0/0:0:0
...and produce something like the following as output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
20 10000694 . G A,<NON_REF> . . . GT 0/2
20 10001661 . T C,<NON_REF> . . . GT ./.
With GATK 3, I believe there was a partial solution involving use of -allSites -L sites.vcf
, but from the current GenotypeGVCFs source code, it looks like the -allSites
option is now ignored and unsupported (and is also omitted from the GATK 4 docs).
Should I be writing my own tools to genotype known variants from the gVCFs, or is this something where we might expect an official GenotypeGVCFs feature in the not-too-distant future? Or maybe there is some straightforward way of doing this currently (aside from using UnifiedGenotyper or HaplotypeCaller with BAM file input and -GENOTYPE_GIVEN_ALLELES
) that I have overlooked?
Thanks very much in advance,
Greg