Hi everyone,
I am just discovering GATK, therefore I strongly apologize in advance, as my question may sound a bit naive or missing the point ! I am currently working on RAD-seq data obtain from pools of individuals (one pool = one species, several species had been sequenced together on a single HiSeq3000 line). We plan to use those data to call for SNP (~300 per species) and then genotyping procedure will be performed with another technology after designing flanking primers for each variant.
A difficulty will be that none of our species have a reference genome, and the fact we have pools (with a quite few number of individuals in each one of them unfortunately). I thought of using Stacks software to create a "pseudo-reference", actually a catalogue of small reads 1 alignments (a set of consensus sequences) and then realign all my reads against this catalogue using BWA (Stacks being not very well-suited for pools). Finally, I was wondering if I could use GATK for SNP calling with the same approach. Can I use a non-ordered and incomplete set of sequences as a reference ?
Once again sorry for that might be a bit silly question, I had a look at this website documentation but I really new with the analyse of NGS data, and I must admit having someone else point of view would help a lot :-) !
Thank you very much, I wish you a very nice day,
Cheers,
Chrys