I am using GATKv3.5. I used SelectVariants as shown below to remove 11 samples from a vcf file:
java -jar GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V all_samples.vcf -xl_sn sample90 -xl_sn sample91 -xl_sn sample92 -xl_sn samples93 -xl_sn sample94 -xl_sn sample95 -xl_sn sample96 -xl_sn sample97 -xl_sn sample98 -xl_sn sample99 -xl_sn sample100 -o subset_samples.vcf
However, when I compare the SNPs between the original VCF and the subset VCF, the 0/0, 0/1, 1/1 genotype calls remain the same, but the AD, DP, GQ, and PL change to the point of nonsense. e.g. a 0,45 AD is called 0/1 (heterozygous). This is the correct call from the original file, where the AD is 53,24, but based on the 0,45 is should be 1/1. As long as the base calls themselves are correct, this shouldn’t cause any downstream errors, but I can’t be sure this is the case. Has anyone else had this error?
Original:
KB222897.1 10810 . C T 113425.71 . AC=102;AF=0.359;AN=284;BaseQRankSum=0.698;ClippingRankSum=0.029;DP=8522;ExcessHet=87.0598;FS=0.000;InbreedingCoeff=-0.4686;MLEAC=102;MLEAF=0.359;MQ=41.97;MQRankSum=-1.540e-01;QD=18.08;ReadPosRankSum=0.132;SOR=0.682 GT:AD:DP:GQ:PL 0/1:53,24:77:99:705,0,1819 0/1:39,16:55:99:470,0,1231 0/1:29,21:50:99:589,0,973
Subset:
KB222897.1 10810 SKB222897.1_10810 C T . PASS AC=96;AF=0.366;AN=262;BaseQRankSum=0.698;ClippingRankSum=0.029;DP=8027;ExcessHet=87.0598;FS=0.000;InbreedingCoeff=-0.4686;MQ=41.97;MQRankSum=-1.540e-01;QD=18.08;ReadPosRankSum=0.132;SOR=0.682;DP=6516 GT:AD:DP:GQ:PL 0/1:0,45:45:99:255,135,0 0/1:0,48:48:99:255,144,0 0/1:0,44:44:99:255,132,0