Hello fellows,
I am a newbie here and would highly appreciate your advice about one particular experimental design.
We have data from RNAseq experiment which was originally designed to assess differential expression. The details of experiment are as follows:
2 modalities of the phenotype
Each phenotype is represented by 4 samples. 1 sample = 60 individuals pooled together at the stage of RNA isolation.
Molecule – polyadenylated mRNA
Sequencing chemistry – Illumina paired-end, read length - 2*100 bp
My question is whether it is correct to use this RNAseq data to call for SNPs? I made previous search and found that most of people calling SNP from RNAseq use 40-1000 samples (= individuals). But they initially designed RNAseq experiment for further GWAS. I see that this analysis cannot be applied to my data (at least because in my case individual flies were pooled without barcoding – 60 flies per a sample). However, can I still call for SNPs and upload the list to database as a list of potential targets for GWAS with, for example, estimation of functional impact upon protein structure? Will they be “true” SNPs, or our experimental design makes even this step invalid?
I found this paper https://www.ncbi.nlm.nih.gov/pubmed/27458203 where people used 2 phenotypes each represented by 2 samples what is almost like our experiment, but still have doubts.