Hi!
I am using VQSR on a non-model species (88 whole genomes, following strictly the GATK best practices). For that I use a database containing ~40000 SNPs, obtained by genotyping by sequencing on 150 different samples. And I do not know which prior I should use for it. For now my commands are:
java -Xmx120g -jar $GATK -T VariantRecalibrator
-R /data2/CPBWGS/ref_genome/Ldec.genome.10062013.fa
-input /data2/CPBWGS/all_fastq_files/decemlineata/CPBWGS_leptinotarsa_SNPs.vcf
-recalFile Leptinotarsa_VariantRecalibrator_prior10.recal
-tranchesFile Leptinotarsa_VariantRecalibrator_prior10.tranches
-resource:dbsnp,known=true,training=true,truth=true,prior=10.0 /data2/CPBWGS/all_fastq_files/decemlineata/Analyses/VQSR/TrainingSetSNPs.vcf
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -an InbreedingCoeff -mode SNP
Because I couldn't figure out which prior I should use, I tried 5, 10 and 20; here are the tranches plots. Note that I can't manage to obtain the other graphs (don't what is wrong with our R configuration on our server; I take any suggestion about it too).
Do you have any hint for me?
Ben