So I've created a pipeline to make a known.snp.vcf for my species without known snps for BaseRecalibrator:
For a subset of my samples (~5% of my dataset, 10 samples):
I ran HaplotypeCaller on each bam file -> GenomicDBImport -> GenotypeVCFs -> SelectVariants SNP -> VariantFiltration.
My questions are:
1) is that a fair practice to estimate known SNPs?
2) Can I use that filtered vcf for BaseRecalibrator on my remaining samples? Will BaseRecalibrator recognize the PASS/FAIL and just use the PASS snps or do I need to take another step to weed out the poor quality snps?