Hello!
I am pretty new to bioinformatics- mostly just have taken one class.
I have been handed some RADseq data and we want to do some SNP variant calling. I am trying to make sure that I use GATK correctly- at present we only have about 250 libraries, but we should end up with at least twice that many, possibly three times.
I have aligned the reads to a reference genome, sorted and indexed them, and am now ready for HaplotypeCaller. We do not have SNP location information, so I was going to follow the Troubleshooting guide by using GATK to de novo call SNPs, feed that information to recalibrate based on quality scores, and then re-run Haplotype caller. I did not do the dedup step due to the nature of the data.
I am trying to make sure that I am using the correct options. What I have is:
java -jar GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R reference.fa \
-I preprocessed_reads.bam \
--genotyping_mode DISCOVERY \
-stand_emit_conf 10 \
-stand_call_conf 30 \
-o raw_variants.vcf
Any help would be greatly appreciated