Hi,
I am trying to execute the pipeline for joint analysis of 75 exomes. I guess it is a reasonable number to use for VQSR but I have the well known problem with the too few INDELs.
After the GenotypeGVCFs step I obtain: 82183 SNPs and 109 INDELs.
Is it possible that I get these numbers from the joint analysis of 75 exomes? Or there could be an error somewhere?
HaplotypeCaller command for 1 among the 75
java -Xmx64g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -I UD_NA001_baserecal_precalread_mrdupgrp.bam -R ucsc.hg19.fa -nct 10 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 --intervals clinical_exome_cod.bed -A Coverage -A FisherStrand -A BaseQualityRankSumTest -A HaplotypeScore -A MappingQualityRankSumTest -A MappingQualityZero -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -A SpanningDeletions -o UD_NA001_P_baserecal_precalread_mrdupgrp_varcall.g.vcf
GenotypeGVCFs command for all the vcfs together:
java -Xmx25g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -V UD_NA001_baserecal_precalread_mrdupgrp_varcall.g.vcf -V UD_NA001_baserecal_precalread_mrdupgrp_varcall.g.vcf -R ucsc.hg19.fa -A Coverage -A FisherStrand -A BaseQualityRankSumTest -A HaplotypeScore -A InbreedingCoeff -A MappingQualityRankSumTest -A MappingQualityZero -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -o UD_JOINT_genot.g.vcf
Thanks
Francesco