Why is converting from fastq to uBAM nesessary before preprocessing?

Hi Everyone,

I am brand new to this so please go easy on me. I have just taken over a project where we are going to be doing variant calling on a large number of human samples. I have inherited a number of scripts that are at least a few years old. I decided I wanted to follow the GATK best practices while noting the differences between the them and the scripts I have. I'm currently trying to push a single family (5 individuals) through the pipeline before applying it to all of the other samples I have.

So, first of all, all of my raw data is stored as paired-end reads in fastq format, I have no uBAM files available to me. According to the data pre-processing for variant discovery steps, the "reference implementations expect the read data to be input in unmapped BAM (uBAM) format. Conversion utilities are available to convert from FASTQ to uBAM." So the first thing I did was use FastqToSam to do the conversion yesterday. This is not an insignificant task, I ran each sample for my test family in parallel and it took roughly 5 hours.

I understand the benefit of using uBAM from the get-go (keeping some metadata that is discarded in fastq as described here: https://gatkforums.broadinstitute.org/gatk/discussion/5990/what-is-ubam-and-why-is-it-better-than-fastq-for-storing-unmapped-sequence-data), but I don't see the benefit of doing this conversion if the first step of the alignment is to convert this uBAM back to fastq before running bwa mem and samtools view. The next step would be to use MergeBamAlignments to merge the mapped and unmapped alignments which I guess I couldn't do if I did not do the original fastq->uBAM conversion.

Basically, my question is if the initial conversion from fastq to uBam is necessary or even recommended in this case. I don't see how it could have any added benefit and converting from and to fastq will incur a significant overhead. For what it's worth, the script I inherited simply ran 'bwa mem' on the paired-end reads and piped the output into 'samtools view -bh' to create the aligned BAM file. From here they would move on to the marking of duplicates. If I don't convert to uBAM and then skip the MergeBamAlignments, will that have an impact on me being able to apply the best practices down the line? I want to stick as close to the best practices as I possibly can, but If I can cut out some unnecessary computation time then that would be great.

Thanks!

Why is converting from fastq to uBAM nesessary before preprocessing?

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List