I've run GATK3.5 with `MarkDuplicates`, but can't get it to run with GATK4 v4.1.1.0. I double-checked the best practices for data pre-processing for variant discovery and noted that the command `MarkDuplicates` still appears there. When I checked the tool documentation index I could pull up `MarkDuplicates` for GATK4 v4.0.8.0, but not v4.1.1.0. So I'm wondering if `MarkDuplicates` is supported by GATK4 v4.1.1.0?
Command:
```
strings=(
S1233686
)
for i in "${strings[@]}"; do
echo "${i}"
# Mark duplicates
/ast/emb/software/gatk-4.1.1.0/gatk MarkDuplicates \
I=/ast/emb/prjt3/aligned_data/${i}Aligned.sortedByCoord.out.bam \
O=/ast/emb/prjt3/aligned_data/${i}.dedupped.bam \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=SILENT \
METRICS_FILE=/ast/emb/prjt3/aligned_data/dedup.metrics.${i}.txt
done
```
Output:
```
USAGE: MarkDuplicates [arguments]
Identifies duplicate reads. This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads
are defined as originating from a single fragment of DNA. Duplicates can arise during sample preparation e.g. library
construction using PCR. See also MarkDuplicates for
detailed explanations of the output metrics.
Version:4.1.1.0
Required Arguments:
--INPUT,-I:String One or more input SAM or BAM files to analyze. Must be coordinate sorted. This argument
must be specified at least once. Required.
****************REMOVED STANDARD HELP INFO TO SHORTEN OUTPUT****************************
Invalid argument 'I=/ast/emb/prjt3/aligned_data/S1233686Aligned.sortedByCoord.out.bam'.
Tool returned:
1
```
The output suggests that `MarkDuplicates` is supported. I hope I didn't make a silly syntax error. I did double-check that my input file exists.
Command:
```
strings=(
S1233686
)
for i in "${strings[@]}"; do
echo "${i}"
# Mark duplicates
/ast/emb/software/gatk-4.1.1.0/gatk MarkDuplicates \
I=/ast/emb/prjt3/aligned_data/${i}Aligned.sortedByCoord.out.bam \
O=/ast/emb/prjt3/aligned_data/${i}.dedupped.bam \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=SILENT \
METRICS_FILE=/ast/emb/prjt3/aligned_data/dedup.metrics.${i}.txt
done
```
Output:
```
USAGE: MarkDuplicates [arguments]
Identifies duplicate reads. This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads
are defined as originating from a single fragment of DNA. Duplicates can arise during sample preparation e.g. library
construction using PCR. See also MarkDuplicates for
detailed explanations of the output metrics.
Version:4.1.1.0
Required Arguments:
--INPUT,-I:String One or more input SAM or BAM files to analyze. Must be coordinate sorted. This argument
must be specified at least once. Required.
****************REMOVED STANDARD HELP INFO TO SHORTEN OUTPUT****************************
Invalid argument 'I=/ast/emb/prjt3/aligned_data/S1233686Aligned.sortedByCoord.out.bam'.
Tool returned:
1
```
The output suggests that `MarkDuplicates` is supported. I hope I didn't make a silly syntax error. I did double-check that my input file exists.