Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

PE mates are lost while downsampling with -dfrac?

$
0
0

Hi GATK team and users,

I am using PrintReads with -dfrac option to simulate different depths of coverage. The original data contains WGS, PE reads (from GATK's Bundle bam, PrintReads with -L 20, -dfrac 0.18). I'm using gatk-3.7.

I think that the PE mates are lost while downsampling (first observed at IGV with 'view as pairs'):

samtools flagstat still see 96.64% of "properly paired" reads but I guess that it is because the flags are inherited from the original bam reads.

samtools flagstat ./NA12878/CEUTrio.HiSeq.WGS.b37.NA12878.L20.dfrac0.18.bam:

##  9278360 + 0 in total (QC-passed reads + QC-failed reads)
## ...
##  8966551 + 0 properly paired (96.64% : N/A)
##  9023705 + 0 with itself and mate mapped
## ...

82% of the name of the reads are unique (and not duplicated as expected for PE data).

samtools view ./NA12878/CEUTrio.HiSeq.WGS.b37.NA12878.L20.dfrac0.18.bam | awk '{print $1}' | sort -n | uniq -c | awk '{print $1}' | sort -n | uniq -c

## 7612964 1
##   832698 2

Is there a way to downsample a bam file keeping the paired reads to simulate that I have got less data but still properly paired?

Thanks a lot for any help/discussion,
EsterQ


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>