Dear GATK staff,
I have a 11 samples that were sequenced using NGS (Illumina HiSeq) and 2 of these samples were also genotyped using an Illumina Human Global screening array (Illumina Iscan). I was looking at your latest WDL script and I've noticed a few steps that I think are related but I don't know how to prepare the inputs for them. Any help or additional explanation on them would be really appreciated!
# Check identity of fingerprints across readgroups
CrossCheckFingerprints
input: haplotype_database_file
What information should I use to create this file? Array data? I have already read these links but I'm still lost:
http://gatkforums.broadinstitute.org/gatk/discussion/comment/37543
http://gatkforums.broadinstitute.org/gatk/discussion/9526/picard-haplotype-map-file-format
# Estimate level of cross-sample contamination
CheckContamination
input: contamination_sites_vcf
What information should I use to create this file?
# Check the sample BAM fingerprint against the sample array
CheckFingerprint
input: haplotype_database_file
input: genotypes
What information should I use to create these files? What does each input stands for?
Thank you very much in advance.
Best regards,
Santiago