Hi,
I have multiple paired-end fastqs from a single biological sample that was prepared by three flowcells, two lanes each.
In short, I have
sample_flowcell1_lane1.R1.fastq.gz sample_flowcell1_lane1.R2.fastq.gz
sample_flowcell1_lane2.R1.fastq.gz sample_flowcell1_lane2.R2.fastq.gz
sample_flowcell2_lane1.R1.fastq.gz sample_flowcell2_lane1.R2.fastq.gz
sample_flowcell2_lane2.R1.fastq.gz sample_flowcell2_lane2.R2.fastq.gz
sample_flowcell3_lane1.R1.fastq.gz sample_flowcell3_lane1.R2.fastq.gz
sample_flowcell3_lane2.R1.fastq.gz sample_flowcell3_lane2.R2.fastq.gz
where R1, R2 are paired-end reads.
I'm trying to generate a single bam file from these fastqs with bwa mem and samtools on reference GRCh37
Then ultimately run whole exome sequencing with following procedure.
bwa_mem for each 6 sets of paired-end reads
samtools sort for each 6 generated bams
samtools merge -r for the 6 generated bams to produce a single bam
Then start the GATK process on the merged.bam
picard.jar AddOrReplaceGroups
picard.jar MarkDuplicates
picard.jar ReorderSam
GATK RealignerTargetCreator
GATK IndelRealigner
GATK Baserecalibrator
and so on ...
I am not sure what is the best way to merge these fastqs and generate a single bam.
Could you recommend me how I should generate a single bam from these fastqs?