I have an issue with BAM files containing multiple read groups for the same sample (identical SM:), essentially this is the same situation as my MuTect1 related question form 2 years ago: http://gatkforums.broadinstitute.org/gatk/discussion/3796/can-mutect-handel-multiple-read-groups - but here it appears to manifest an issue: they cause MuTect2 to throw the following error saying it can't detect index type:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Problem detecting index type
##### ERROR ------------------------------------------------------------------------------------------
Regarding my read group set-up I have matched normal and tumor samples where each sample is sequenced across two or more lanes, following BWA I've merged my sequencing run BAM files such that I'm using a single per-sample BAM with the subsequent MarkDuplicates and BQSR steps, my read group structure is:
@RG ID:RG_1 LB:Lib_10_10065_DNA1_1H SM:10_10065_DNA1_1H PL:ILLUMINA
@RG ID:RG_2 LB:Lib_10_10065_DNA1_1H SM:10_10065_DNA1_1H PL:ILLUMINA
Sometimes I have more than on library prep too, so I've encoded that as follows with 4 BAMs being merged into one per sample BAM here:
@RG ID:RG_3 LB:Lib_10_10065_F2_DNA2H_1 SM:10_10065_F2_DNA2H PL:ILLUMINA
@RG ID:RG_4 LB:Lib_10_10065_F2_DNA2H_1 SM:10_10065_F2_DNA2H PL:ILLUMINA
@RG ID:RG_5 LB:Lib_10_10065_F2_DNA2H_2 SM:10_10065_F2_DNA2H PL:ILLUMINA
@RG ID:RG_6 LB:Lib_10_10065_F2_DNA2H_2 SM:10_10065_F2_DNA2H PL:ILLUMINA
(My assumption was that this is beneficial, so in cases of the same library prep MarkDuplicates would consider all read groups for deduplication that had the same LB: and that BQSR will always model each read group ID: separately which is ideal as they originate from a different lane.)
These BAM files run fine in the HaplotypeCaller in GVCF mode, --bamOutput
confirms both read groups are used. I've also checked the index with ValidateSamFile using Picard and I can't see any index related issues in it's output, the only error being ERROR:MATE_NOT_FOUND which I was expecting.
I'm also curious if the answer to my old question with MuTect1 still holds up as I note this related issue with MuTect1 and multiple BAM inputs http://gatkforums.broadinstitute.org/gatk/discussion/4641/build-a-panel-of-normal-for-mutect