1. What does error message X mean?
See the FAQ section on frequently encountered errors.
2. Can I use Genome STRiP to do discovery or genotyping in a single high-coverage individual?
Genome STRiP is designed to discover and genotype variants in populations and
uses the information from multiple individuals simultaneously. Typically you
will need data from at least 20 or 30 individuals to get good results.
That being said, it may be possible to use a "background population" along
with a single high-coverage individual to run Genome STRiP. The background
population does not need to have the same depth of coverage as the target
genome you want to process, but reads will need to be aligned to the same
reference sequence. A good background population might be 50 or so individuals
from the 1000 Genomes Project chosen from diverse population groups. This
approach has not been widely tested, although I have looked at targeted
resequencing loci using this strategy with some success. If you try this
strategy, please share your experiences.
3. Does Genome STRiP only work with deletions?
In the current version, only deletions (relative to the reference) are
supported in discovery and genotyping. We are actively working on discovery
and genotyping of other kinds of structural variants.
4. Is the source code available?
Not at this time, but we are planning to release the source code shortly.
5. Can I run discovery on a small genomic region?
If you have whole-genome sequence data, you can run on just a small region
using the standard -L
argument to the GATK. For example
-L
chr1:1000000-2000000
.
If you have targeted resequencing data, where you have only sequenced a small
subset of the genome, then you additionally need to set the effective genome
size to be smaller. To do this, you currently need to modify the configuration
parameters in conf/genstrip_parameters.txt
(the file location is specified
with the -configFile
command line argument).
You will need to change these three parameters:
input.genomeSize = A + X + Y
input.genomeSizeMale = 2*A + X + Y
input.genomeSizeFemale = 2*A + 2*X
where A is the total size of the autosomal reference and X and Y are the
lengths of the X and Y chromosomes. Note that genomeSize is in haploid bases
while genomeSizeMale and genomeSizeFemale are in diploid bases.
Of course, if your target region doesn't include X or Y, then just set
genomeSizeMale
and genomeSizeFemale
to 2*genomeSize
. See the installtest
configuration file for an example, where the effective genome size is set to
200Kb.