Hi,
I want to use ContEst to estimate the contamination levels of my patient-matched normal samples, but all my data are WGS data, and I dot not have genotype array of my normal samples.
My code is here:
java -jar \
GenomeAnalysisTK-3.6.jar \
-T ContEst \
-R hg19_complete.fasta \
-I:eval G01H.recal.bam \ #about 110G
-I:genotype G01N.recal.bam \ #about 115G
--popfile hg19_population_stratified_af_hapmap_3.3.vcf \
-isr INTERSECTION \
-population CHB \
-o contamination_results_G01Hbam_Nbam.txt
Result:
name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites
META CHB n/a 9.2 2.2 8.2 10.4 37
So my question is why there is only 37 sites? Is it means that I have to use genotype array as the input of parameter –genotype? Or it is because the mean coverage of my data is 22X, but ContEst requires at least 50x coverage homozygous sites.
Then I try to use HaplotypeCaller SelectVariants & VariantFiltration to create a .vcf file of my normal samples,
So I can run ContEst like these:
java -jar \
GenomeAnalysisTK-3.6.jar \
-T ContEst \
-R hg19_complete.fasta \
-I G01H_chr22.recal.bam \
--genotypes G01N_chr22_filtered_snaps.vcf \
--popfile hg19_population_stratified_af_hapmap_3.3.vcf \
-isr INTERSECTION \
-population CHB \
-o contamination_results_chr22_CHB.txt
Result:
name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites
META CHB n/a 0.1 0.1 0.1 0.2 724
When I use chr22 to test , Contest find 724 sites which is more than WGS data as above.
And my question is can I use a .vcf file which created by HaplotypeCaller as the input of parameter --genotypes