Options to map reads to reference sequence
A Bowtie2AlignOptions
object contains options to run the
bowtie2
function, which aligns reads to a reference
sequence.
creates a alignOptions
= Bowtie2AlignOptionsBowtie2AlignOptions
object with default property
values.
Bowtie2AlignOptions
requires the Bioinformatics Toolbox™ Interface for Bowtie Aligner. If this support package is not installed, then the function provides a download
link. For details, see Bioinformatics Toolbox Software Support Packages.
Note
Bowtie2AlignOptions
is supported on Mac and UNIX® platforms only.
sets properties using one or more name-value pair arguments. Enclose each
property name in quotes. For example, alignOptions
= Bowtie2AlignOptions(Name,Value)alignOptions =
Bowtie2AlignOptions('Trim5',10)
specifies to trim 10 residues from
the 5' end.
S
— Alignment parametersAlignment parameters, specified as a character vector.
S
must be in the Bowtie 2 option syntax
(prefixed by one or two dashes) [1].
AllowDovetail
— Flag to allow dovetail configurationsfalse
(default) | true
Flag to allow dovetail configurations, specified as
true
or false
. This property
specifies whether the alignment of one mate can extend past the beginning of
the alignment of the other mate and be considered concordant.
This property applies to paired-end reads only.
Example: 'AllowDovetail',true
Data Types: logical
AmbiguousPenalty
— Penalty for positions with ambiguous characters1
(default) | nonnegative integerPenalty for positions with ambiguous characters on the read sequence, reference sequence, or both, specified as a nonnegative integer.
Example:
'AmbiguousPenalty',2
Data Types: double
Encoding
— Encoding format of base quality'Phred33'
(default) | 'Phred64'
| 'Solexa'
Encoding format of the base quality in the input files, specified as one
of the following: 'Phred33'
,
'Phred64'
, or 'Solexa'
.
Example: 'Encoding','Phred64'
Data Types: char
| string
ExcludeContain
— Flag to allow one mate alignment to contain other matefalse
(default) | true
Flag to allow one mate alignment to contain the alignment of the other
mate and to be considered concordant, specified as true
or false
.
This property applies to paired-end reads only.
Example: 'ExcludeContain',true
Data Types: logical
ExcludeDiscordant
— Flag to include discordant alignmentsfalse
(default) | true
Flag to include discordant alignments, specified as
true
or false
. A discordant
alignment is an alignment where both mates align uniquely, but not in a way
that satisfies the paired-end constraints.
Example: 'ExcludeDiscordant',true
Data Types: logical
ExcludeMixed
— Flag to exclude mixed alignmentsfalse
(default) | true
Flag to exclude mixed alignments, specified as true
or
false
. A mixed alignment consists of mate reads that
are not concordant or discordant, but align individually.
This property applies to paired-end reads only.
Example: 'ExcludeMixed',true
Data Types: logical
ExcludeOverlap
— Flag to allow mate alignment overlapfalse
(default) | true
Flag to allow the alignment of one mate to overlap with the alignment of
the other mate and to be considered concordant, specified as
true
or false
.
Example: 'ExcludeOverlap',true
Data Types: logical
ExcludeUnaligned
— Flag to exclude reads that failed to alignfalse
(default) | true
Flag to exclude reads that failed to align, specified as
true
or false
.
Example: 'ExcludeUnaligned',true
Data Types: logical
ExtraBowtie2Command
— Additional options not included in object properties''
(default) | character vectorAdditional options not included in the object properties, specified as
a character vector. The character vector must be in the Bowtie 2
option syntax (prefixed by one or two dashes). The default value
is an empty character vector ''
.
Example: 'ExtraBowtie2Command','--version'
Data Types: char
| string
IgnoreQuality
— Flag to ignore read position qualityfalse
(default) | true
Flag to ignore the actual read position quality when a mismatch occurs,
specified as true
or false
. Setting
this property to true
allows the quality value at that
mismatched position to be the highest possible, regardless of the actual
value.
Example: 'IgnoreQuality',true
Data Types: logical
MatchBonus
— Reward added to alignment score2
(default) | nonnegative integerReward added to the alignment score when a position in the read matches a position in the reference, specified as a nonnegative integer.
Example: 'MatchBonus',5
Data Types: double
MaxAmbiguousFunction
— Function governing maximum number of ambiguous characters'L,0,0.15'
(default) | character vector | stringFunction governing the maximum number of ambiguous characters allowed in a read, specified as a character vector or string.
The function has the format 'f,B,A'
, where
f is a function type, B is a constant term, and
A is a coefficient. Available function types are:
'C'
– Constant
'L'
– Linear
'S'
– Square root
'G'
– Natural log
The resulting function is H(x) = B + A * f(x)
, where
x is the read length.
The default function is 'L,0,0.15'
, that is,
H(x) = 0 + 0.15 * x
.
Example: 'MaxAmbiguousFunction','L,-0.4,-0.6'
Data Types: char
| string
MemoryMappedIndex
— Flag to use memory mapping when loading indexfalse
(default) | true
Flag to use memory mapping (instead of file I/O) when loading the index,
specified as true
or false
. Memory
mapping allows many concurrent processes to share the memory image of the
index, resulting in a more efficient parallelization of the task.
Example: 'MemoryMappedIndex',true
Data Types: logical
MinScoreFunction
— Function governing minimum score threshold of alignmentFunction governing the minimum score threshold of an alignment, specified as a character vector or string.
The function has the format 'f,B,A'
, where
f is a function type, B is a constant term, and
A is a coefficient. Available function types are:
'C'
– Constant
'L'
– Linear
'S'
– Square root
'G'
– Natural log
The resulting function is H(x) = B + A * f(x)
, where
x is the read length.
For the 'EndToEnd'
alignment mode, the default function
is 'L,-0.6,-0.6'
. For the 'Local'
mode, the default function is 'G,20,8'
.
Example: 'MinScoreFunction','L,-0.4,-0.6'
Data Types: char
| string
MismatchPenalty
— Maximum and minimum values to compute mismatch penalty[6 2]
(default) | two-element vectorMaximum and minimum values to compute the mismatch penalty during alignment, specified as a two-element vector. The first element is the maximum value and the second element is the minimum value.
A number less than or equal to the maximum value, and greater than or
equal to the minimum value is subtracted from the alignment score for each
position where a read character aligns to a reference character, the
characters do not match, and neither is an N
character.
Example: 'MismatchPenalty',[5 3]
Data Types: double
Mode
— Alignment mode'EndToEnd'
(default) | 'Local'
Alignment mode, specified as 'EndToEnd'
or
'Local'
.
In the 'Local'
mode, only part of the read must align
to the reference, and some residues can be omitted (soft-clipped) to achieve
the best alignment score. In the 'EndToEnd'
mode, the
entire read must align without any soft-clipping.
Example: 'Mode','Local'
Data Types: char
| string
Nondeterministic
— Flag to reinitialize pseudo-random generatorfalse
(default) | true
Flag to reinitialize the pseudo-random generator for each read using the
current time, specified as true
or
false
. If true
, the alignments
reported for two identical reads can be different. The default value is
false
, that is, the pseudo-random generator is
reinitialized using a seed derived from read information and the seed
number.
Example: 'Nondeterministic',true
Data Types: logical
NoGapPositions
— Number of positions where gaps are not allowed4
(default) | nonnegative integerNumber of positions at the beginning or end of each read where gaps are not allowed, specified as a nonnegative integer.
Example: 'NoGapPositions',5
Data Types: double
NumAlignments
— Maximum number of valid alignments to report'Best'
(default) | 'All'
| positive integerMaximum number of valid alignments to report before terminating the
search, specified as a positive integer, 'Best'
, or
'All'
. If you specify a positive integer
N, the function searches for up to
N distinct, valid alignments for each read.
'Best'
reports the best alignment for each read.
'All'
reports all the valid alignments for each read
sorted by alignment scores.
The alignment score for a paired-end alignment equals the sum of the alignment scores of individual mates.
Example: 'NumAlignments','All'
Data Types: double
| char
| string
NumReseedings
— Maximum number of reseeding attempts2
(default) | nonnegative integerMaximum number of reseeding attempts with repetitive seeds, specified as a nonnegative integer. During reseeding, the function chooses a new set of reads at different offsets to find more alignments.
Example: 'NumReseedings',5
Data Types: double
NumSeedExtensions
— Maximum number of consecutive seed extension attempts15
(default) | nonnegative integerMaximum number of consecutive seed extension attempts before getting a new seed, specified as a nonnegative integer. A seed extension fails if it does not yield an alignment with the best (or second-best) score.
Example: 'NumSeedExtensions',10
Data Types: double
NumSeedMismatches
— Number of allowed mismatches in seed alignment0
(default) | 1
Number of allowed mismatches in a seed alignment during the multiseed
alignment, specified as 0
or 1
.
Example: 'NumSeedMismatches',1
Data Types: double
NumThreads
— Number of parallel threads to perform alignment1
(default) | positive integerNumber of parallel threads to perform the alignment, specified as a positive integer. Threads run on separate processors or cores. Increasing the number of threads provides a significant increase in speed (close to linear) but also increases the memory footprint.
Example: 'NumThreads',4
Data Types: double
Offrate
— Offrate to use when reading indexNaN
(default) | positive integerOffrate to use when reading the index to reduce the memory footprint, specified as a positive integer. The offrate must be greater than the offrate used to build the index.
Example: 'Offrate',20
Data Types: double
PadPositions
— Position in reference sequence where alignment begins15
(default) | nonnegative integerPosition in the reference sequence where the alignment for each sequence begins, specified as a nonnegative integer.
Example: 'PadPositions',10
Data Types: double
ReadGapCosts
— Gap costs for opening and extending gap[5 3]
(default) | two-element vector of nonnegative integersGap costs for opening and extending a gap on the read, specified as a
two-element vector of nonnegative integers. The first element is the cost of
opening a gap, and the second element is the cost of extending a gap. Given
the cost vector [GO
GE]
, a read gap of length
N is assigned a penalty of
GO + N *
GE
.
Example: 'ReadGapCosts',[4 2]
Data Types: double
ReadGroupID
— Read group ID to add on @RG
header line''
(default) | character vector | stringRead group ID to add on the @RG
header line in the
output SAM report, specified as a character vector or string. If you specify
any read group ID, the function prints the @RG
header
line with the tag ID:
followed by the specified group
ID.
Example: 'ReadGroupID','ID1'
Data Types: char
| string
ReadGroup
— Read group information to add as field on @RG
header line''
(default) | character vector | stringRead group information to add as a field on the @RG
header line in the output SAM report, specified as a character vector or
string. This property applies only if you specify
'ReadGroupID'
.
Example: 'ReadGroup','Control'
Data Types: char
| string
RefGapCosts
— Gap costs for opening and extending gap[5 3]
(default) | two-element vector of nonnegative integersGap costs for opening and extending a gap on the reference, specified as a
two-element vector of nonnegative integers. The first element is the cost of
opening a gap, and the second element is the cost of extending a gap. Given
the cost vector [GO
GE]
, a reference gap of length
N is assigned a penalty of
GO + N *
GE
.
Example: 'RefGapCosts',[4 2]
Data Types: double
Reorder
— Flag to reorder SAM recordsfalse
(default) | true
Flag to reorder SAM records to maintain the same order as in the input
files, specified as true
or false
.
This property applies only when the number of parallel threads is greater
than one. When you use one thread, the order of the records in the output is
the same as the order of the input.
Example: 'Reorder',true
Data Types: logical
Seed
— Number to set seed in pseudo-random number generator0
(default) | nonnegative integerNumber to set the seed in the pseudo-random number generator, specified as a nonnegative integer.
Example: 'Seed',3
Data Types: double
SeedIntervalFunction
— Function governing distance between seed substringsFunction governing the distance between seed substrings during the multiseed alignment, specified as a character vector or string.
The function has the format 'f,B,A'
, where
f is a function type, B is a constant term, and
A is a coefficient. Available function types are:
'C'
– Constant
'L'
– Linear
'S'
– Square root
'G'
– Natural log
The resulting function is H(x) = B + A * f(x)
, where
x is the read length.
For the 'EndToEnd'
alignment mode, the default function
is 'S,1,1.15'
. For the 'Local'
mode,
the default function is 'S,1,0.75'
.
Example: 'SeedIntervalFunction','S,2,2.15'
Data Types: char
| string
SeedLength
— Seed substring length to align during multiseed alignment20
(default) | positive integerSeed substring length to align during the multiseed alignment, specified as a positive integer.
Example: 'SeedLength',25
Data Types: double
Skip
— Number of reads to ignore0
(default) | nonnegative integerNumber of reads to ignore from the beginning of the input files, specified as a nonnegative integer.
Example: 'Skip',5
Data Types: double
Trim3
— Number of residues to trim from 3' end0
(default) | nonnegative integerNumber of residues to trim from the 3' end of each read before aligning, specified as a nonnegative integer.
Example: 'Trim3',5
Data Types: double
Trim5
— Number of residues to trim from 5' end0
(default) | nonnegative integerNumber of residues to trim from the 5' end of each read before aligning, specified as a nonnegative integer.
Example: 'Trim5',5
Data Types: double
UpTo
— Number of reads to consider from beginning of input filesInf
(default) | positive integerNumber of reads to consider from the beginning of input files, specified
as a positive integer. The default value is Inf
, that is,
all reads are considered.
Example: 'UpTo',1000
Data Types: double
getBowtie2Command | Translate object properties to Bowtie 2 options |
getBowtie2Table | Retrieve table with object properties and equivalent Bowtie 2 options |
preset | Set combination of alignment options |
run | Map sequence reads to reference sequence using Bowtie 2 |
Build a set of index files for the Drosophila genome. An error message appears if you do not have the Bioinformatics Toolbox Interface for Bowtie Aligner support package installed when you run the function. Click the provided link to download the package from the Add-on menu.
For this example, the reference sequence Dmel_chr4.fa
is already
provided with the toolbox.
status = bowtie2build('Dmel_chr4.fa', 'Dmel_chr4_index');
If the index build is successful, the function returns 0
and
creates the index files (*.bt2
) in the current folder. The files have
the prefix 'Dmel_chr4_index'
.
Sometimes the index files exist, and you want to know the reference sequence used to
build the index. In this case, use the bowtie2inspect
function to get more information about the
reference.
bowtie2inspect('Dmel_chr4', 'Dmel_chr4_retrieved.fa');
By default, the output file Dmel_chr4_retrieved.fa
contains the sequence of the reference. You can also get a summary information about the reference name and lengths instead of the actual sequence. For details on the available options, see Bowtie2InspectOptions
.
Once the index is ready, map the read sequences to the reference using the
bowtie2
function. The paired-end read files
(SRR6008575_10k_1.fq
and SRR6008575_10k_2.fq
)
are already provided with the toolbox.
bowtie2('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam');
The output is a SAM-formatted file that contains the mapping results.
You can specify different alignment options by passing in a Bowtie 2 syntax string or
using a Bowtie2AlignOptions
object.
Suppose you want to trim some residues from the 3'
end before
aligning. First, create a Bowtie2AlignOptions
object.
alignOpt = Bowtie2AlignOptions;
Trim four residues from the 3'
end before aligning.
alignOpt.Trim3 = 4;
Map reads to the reference using the specified alignment option.
flag = bowtie2('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4_trimmed.sam',alignOpt);
[1] Langmead, B., and S. Salzberg. "Fast gapped-read alignment with Bowtie 2." Nature Methods. 9, 2012, 357–359.
bowtie2
| Bowtie2AlignOptions
| bowtie2build
| Bowtie2BuildOptions
| bowtie2inspect
| Bowtie2InspectOptions