Hello,
I notice the following warning messages during the first step of VQSR:
------------------------------------------------------------------------------------------
Done. There were 3 WARN messages, the first 3 are repeated below.
WARN 16:10:55,436 VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants.
WARN 16:40:01,432 RScriptExecutor - RScript exited with 127. Run with -l DEBUG for more info.
WARN 16:40:01,449 RScriptExecutor - RScript exited with 127. Run with -l DEBUG for more info.
I know that the latter two is probably due to me not having the required R libraries set up, but what about the first warning on large training set please? My code is as the following and I'm using GATK 3.6:
java -Xmx45g -jar $GATK -T VariantRecalibrator -R $REF -input ./INDIVIDUAL.raw.snps.indels.combined.vcf \
-recalFile ./INDIVIDUAL.snp.recal \
-tranchesFile ./INDIVIDUAL.snp.tranches \
-rscriptFile ./INDIVIDUAL.snp.recalibrate_SNP_plots.R \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 /gatkRefDir/hapmap_3.3.hg19.sites.vcf \
-resource:omni,known=false,training=true,truth=true,prior=12.0 /gatkRefDir/1000G_omni2.5.hg19.sites.vcf \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 /gatkRefDir/1000G_phase1.snps.high_confidence.hg19.sites.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /gatkRefDir/dbsnp_138.hg19.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP \
-mode SNP
Thanks a lot.
Helene