Strange behaviour (bias?) in BaseRecalibrator

Hello,
I would like to report a possible weird behaviour of GATK BaseRecalibrator.
During my analyses I follow the suggested "Best Practices", so after aligning I mark the duplicates (if needed) and always recalibrate.
Recently I wrote a python script that, given a bam file and the related bed, analyses the coverage in the "padding" regions upstream and downstream the regions of interest (exons) (fig.1).

The aim is to see how far beyond the exon, in both directions, the coverage stays above a certain threshold. Also a quality parameter "q" can be specified, so that any base with phred < q is not counted in the total coverage.

While doing this, I found out what looks like a strange bias. If I do not use the q parameter (so phred quality is not taken into account), the coverage level decreases gradually while we move away from the exon (fig.2), as expected.

If I use the q=30 parameter, though, I always observe a significant fall in coverage mainly in positions 2 and 3, both upstream and downstream; then the levels go back up and slowly decrease normally (fig.3).

This behaviour is never observed when the .bam file is NOT recalibrated. When I use the q=30 threshold on non-recalibrated bam, I do not detect any trouble (fig.4).

It looks like the recalibration process penalizes the base calls in those positions for some reason, and this can be verified by simply opening the recalibrated bam vs the non-recalibrated one in IGV.

When hovering on the bases at positions 2 and 3 (upstream or downstream with respect to the exon), a drop in quality can be noticed in the recal ones. The other positions are pretty much immune to this. One could argue that for some reason, the base calls in those particular positions have a quality score close to 30 before recalibration, and this process simply lowers them below our threshold. But it's not like that: before recalibration, all the positions flanking the exon have similar quality scores - pretty high in all the cases I analysed - so there's no implicit "disadvantage" in the starting quality for positions 2 and 3. It just seems that recalibration is particularly severe in those spots.

I tried to single out any possible confounding factor I could think of:
-- Tried several samples, coming from different runs and different points in time;
-- Tried runs from MiSeq, HiSeq, NextSeq;
-- Tried to use different versions of dbSNP as known sites, plus a high confidence set of SNPs from 1000Genomes;
-- Tried to use both GATK3 and GATK4

but no luck, the same behaivour persists. Do you have any clues?

Thanks

Mauro

Commands used for recalibration:
/usr/bin/java -jar /softwares/GATK_4.0/gatk-package-4.0.0.0-local.jar BaseRecalibrator -R hg19_ucsc_filtered.fa -I CA.bam -O CA.table --known-sites dbSNP_150_hg19_chr.vcf

/usr/bin/java -jar /softwares/GATK_4.0/gatk-package-4.0.0.0-local.jar ApplyBQSR -R hg19_ucsc_filtered.fa -I CA.bam -bqsr CA.table -O CA_recal.bam

Figures:
*fig.1: visual description of the "padding" regions under analysis, in orange.
* fig.2: the coverage level for each exon while moving "away" from it in both directions, for 15bp. X axis: positions relative to the exon (negative=upstream, positive=downstream), Y axis: coverage
* fig.3: the same graph when intriducing the q=30 filtering threshold. Many exons become zero-covered in positions 2,3.
* fig.4: the graph when using the same q=30 threshold on a NON-recalibrated bam.

Strange behaviour (bias?) in BaseRecalibrator

Trending Articles

Moondru Mudichu 27-05-2016 – Polimer tv Serial

Password Reset on SX6036?

Snes4Sym emulator for nokia s60v3

the range cannot be deleted (6028) in microsoft word

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

Throw Back: Samini — Where My Baby Dey (Prod by Kaywa)

Man to stand trial on three charges of money laundering

Joshua Pigden from Bristol faces trial over rape and Diazepam...

DRP MAKER WITH CHEMICALS 9491234553

Muloraki Au

ESENT データベース USS.jtx で、エラーイベント ID 490、454、489、455 が記録される事象について

Revised GDS Gratuity, Severance Amount and SDBS contribution - Social...

Name Of Parts Of The Day In hindi And English-List Of Part Of Days In Hindi

Practice Sheet of Right form of verbs for HSC Students

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Chai Status, Funny Tea Quotes in Hindi, चाय पर शायरी

PRC MOE SCHOOL TEACHER CHARGED FOR SEXUALLY PENETRATING 12 YEAR-OLD WITH FINGERS

Nahitaji matokeo ya kidato cha nne ya mwaka 1998

Bhiknur Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers List...

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise