Are my data and research question appropriate for analysis with HaplotypeCaller?

Hello GATK team!

I'm having a hard time finding discussion of and examples of use of HaplotypeCaller by researchers with similar data and research questions as me so apologies for posting on your forum with such high end questions.

My goal is to explore genetic diversity at 7 different amplicon sites in a wide array of populations (60) of a very genetically diverse species of parasitic nematode (each population sample consists of pooled DNA of ~200-500 worms). To explore genetic diversity I would like to count the number of haplotypes present (and in what proportion) in each population at each of these seven loci.

I intend to infer signatures of selection by drug treatment from the level of haplotypic diversity in each population at each locus, some loci being candidate genes of interest and some being control loci. The populations are part of controlled experiments with drug treatment so drops in diversity at candidate loci in post-treatment population samples relative to associated pre-treatment samples should tell us if selection is happening near that locus.

My data consists of roughly 5,000 - 25,000 ~600 bp paired-end reads per population, per amplicon locus, that were sequenced on a MiSeq V3 run. I've already aligned the reads of each population sample to a reference 'genome' consisting of the seven loci using bowtie2 [--local --no-mixed]. Reads were indexed on the MiSeq by population, so each fastq R1/R2 fileset contains reads of each of the 7 loci from a single population.

I would now like to take these 60 .sam/.bam files and assess them with HaplotypeCaller to get counts of unique haplotypes that pass confidence (ie aren't potential false haplotypes due to sequence error etc) for each of the 7 loci, in each of the 60 population samples. Additionally (but not as important as just the counts), I'd like the proportion that these haplotypes exist in each population, and would prefer the presence of each unique haplotype to possibly be traceable across the populations.

Here's what I think is the main problem, this worm is exceptionally diverse, both between populations and within population. I expect roughly 5-40 true haplotypes within each population, usually due to variations in the presence of large indels (5-50 base pairs) and many SNV sites (probably 20-50 variant sites in one population across the 600 bps would be a good expectation).

So to get to my actual question (sorry), 1) Is HaplotypeCaller designed to give me the information I want - haplotype counts, frequency, and presence across populations?, and 2) Will the very high genetic diversity cause problems for the tool and confound the output given its optimized for organisms with much lower levels of diversity?

Thank you so much for your help!

Andew

Are my data and research question appropriate for analysis with HaplotypeCaller?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112