Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Split'N'Trim Errors

$
0
0

Hello all,

I am having a problem during the Split'N'Trim phase of the RNAseq Best Practices. The script I have used is as follows:

java -jar /data1/APPS/gatk/GenomeAnalysisTK.jar -T SplitNCigarReads \
    -R /path/reference.fa \
    -I 042517Sam3C_S3_combined_dedup.bam \
    -o 042517Sam3C_S3_combined_split.bam \
    -rf ReassignOneMappingQuality \
    -RMQF 255 -RMQT 60 \
    -U ALLOW_N_CIGAR_READS

When I use ValidateSamFile to examine this output I receive the following errors:

Error Type      Count
ERROR:INVALID_CIGAR     397
ERROR:MATES_ARE_SAME_END        6588323
ERROR:MATE_NOT_FOUND    5711036
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND     13112240
ERROR:MISMATCH_FLAG_MATE_UNMAPPED       78
ERROR:MISMATCH_MATE_ALIGNMENT_START     15160687
ERROR:MISMATCH_MATE_CIGAR_STRING        20226660

This is a similar problem to this thread:
https://gatkforums.broadinstitute.org/gatk/discussion/7957/errors-when-running-picard-validatesamfile-on-bam-file-got-from-splitncigarreads

I have tried simply skipping this phase, however when I run BQSR I receive this message:

INFO  14:06:29,259 MicroScheduler - 67278005 reads were filtered out during the traversal out of approximately 69796967 total reads (96.39%)
INFO  14:06:29,260 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter
INFO  14:06:29,260 MicroScheduler -   -> 861213 reads (1.23% of total) failing DuplicateReadFilter
INFO  14:06:29,260 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO  14:06:29,260 MicroScheduler -   -> 515857 reads (0.74% of total) failing MalformedReadFilter
INFO  14:06:29,260 MicroScheduler -   -> 57110033 reads (81.82% of total) failing MappingQualityUnavailableFilter
INFO  14:06:29,261 MicroScheduler -   -> 3740357 reads (5.36% of total) failing MappingQualityZeroFilter
INFO  14:06:29,261 MicroScheduler -   -> 5050545 reads (7.24% of total) failing NotPrimaryAlignmentFilter
INFO  14:06:29,261 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter

I acknowledge that I need to reassign mapping qualities so I run the following script:

java -jar /data1/APPS/gatk/GenomeAnalysisTK.jar -T PrintReads \
    -R /path/reference.fa \
    -I 042517Sam3C_S3_combined_dedup.bam \
    -o 042517Sam3C_reassigned.bam \
    -rf ReassignOneMappingQuality \
    -RMQF 255 -RMQT 60 \
    --filter_reads_with_N_cigar

When I try to validate the file produced I receive this error:

ERROR:MATE_NOT_FOUND    5649538

I feel that at this point I have run into a dead end and don't know where to turn.

The only deviations from the best practices methodology I have done are to run MergeBamAlignment on the 2-pass file produced by STAR as validation of that file reported the MATE_NOT_FOUND error and this fixed that error. I also have multiple lanes and multiple samples so I created many SJ.out.tab files (48 to be exact) during the 1st pass of STAR, used cat to the combine all of the SJ.out.tab files into an SJ.all.tab file, and used that for the 2nd pass. I saw a suggestion to do this on a forum post, however, I can't find the link (my advisor also suggested this). I compared the output STAR sam file from this method with the method of running all samples separately and the results were more or less the same.

The file produced by the final step of MarkDuplicates (042517Sam3C_S3_combined_dedup.bam) passes the validation with "no errors found."

Any help/suggestions would be greatly appreciated!

As a side note, I tried running the SplitNCigarReads in GATK4.beta using the following script:

java -jar /data1/APPS/gatk-4.beta.1/gatk-package-4.beta.1-local.jar SplitNCigarReads \
    -R /path/reference.fa \
    -I 042517Sam3C_S3_combined_dedup.bam \
    -O 042517Sam3C_S3_combined_split.bam

And the engine stopped immediately as it started the second pass.


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>