Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

MuTect2 and multiple read groups per sample, issue with index

$
0
0

I have an issue with BAM files containing multiple read groups for the same sample (identical SM:), essentially this is the same situation as my MuTect1 related question form 2 years ago: http://gatkforums.broadinstitute.org/gatk/discussion/3796/can-mutect-handel-multiple-read-groups - but here it appears to manifest an issue: they cause MuTect2 to throw the following error saying it can't detect index type:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.6-0-g89b7209): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Problem detecting index type
##### ERROR ------------------------------------------------------------------------------------------

Regarding my read group set-up I have matched normal and tumor samples where each sample is sequenced across two or more lanes, following BWA I've merged my sequencing run BAM files such that I'm using a single per-sample BAM with the subsequent MarkDuplicates and BQSR steps, my read group structure is:

@RG     ID:RG_1 LB:Lib_10_10065_DNA1_1H    SM:10_10065_DNA1_1H        PL:ILLUMINA
@RG     ID:RG_2 LB:Lib_10_10065_DNA1_1H    SM:10_10065_DNA1_1H        PL:ILLUMINA

Sometimes I have more than on library prep too, so I've encoded that as follows with 4 BAMs being merged into one per sample BAM here:

@RG     ID:RG_3 LB:Lib_10_10065_F2_DNA2H_1 SM:10_10065_F2_DNA2H       PL:ILLUMINA
@RG     ID:RG_4 LB:Lib_10_10065_F2_DNA2H_1 SM:10_10065_F2_DNA2H       PL:ILLUMINA
@RG     ID:RG_5 LB:Lib_10_10065_F2_DNA2H_2 SM:10_10065_F2_DNA2H       PL:ILLUMINA
@RG     ID:RG_6 LB:Lib_10_10065_F2_DNA2H_2 SM:10_10065_F2_DNA2H       PL:ILLUMINA

(My assumption was that this is beneficial, so in cases of the same library prep MarkDuplicates would consider all read groups for deduplication that had the same LB: and that BQSR will always model each read group ID: separately which is ideal as they originate from a different lane.)

These BAM files run fine in the HaplotypeCaller in GVCF mode, --bamOutput confirms both read groups are used. I've also checked the index with ValidateSamFile using Picard and I can't see any index related issues in it's output, the only error being ERROR:MATE_NOT_FOUND which I was expecting.

I'm also curious if the answer to my old question with MuTect1 still holds up as I note this related issue with MuTect1 and multiple BAM inputs http://gatkforums.broadinstitute.org/gatk/discussion/4641/build-a-panel-of-normal-for-mutect


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>