Objective
Prepare a reference sequence so that it is suitable for use with BWA and GATK.
Prerequisites
Installed BWA
Installed SAMTools
- Installed Picard
Steps
- Generate the BWA index
- Generate the Fasta file index
- Generate the sequence dictionary
1. Generate the BWA index
Action
Run the following BWA command:
bwa index -a bwtsw reference.fa
where -a bwtsw
specifies that we want to use the indexing algorithm that is capable of handling the whole human genome.
Expected Result
This creates a collection of files used by BWA to perform the alignment.
2. Generate the fasta file index
Action
Run the following SAMtools command:
samtools faidx reference.fa
Expected Result
This creates a file called reference.fa.fai
, with one record per line for each of the contigs in the FASTA reference file. Each record is composed of the contig name, size, location, basesPerLine and bytesPerLine.
3. Generate the sequence dictionary
Action
Run the following Picard command:
java -jar picard.jar CreateSequenceDictionary \
REFERENCE=reference.fa \
OUTPUT=reference.dict
Note that this is the new syntax for use with the latest version of Picard. Older versions used a slightly different syntax because all the tools were in separate jars, so you'd call e.g. java -jar CreateSequenceDictionary.jar
directly.
Expected Result
This creates a file called reference.dict
formatted like a SAM header, describing the contents of your reference FASTA file.