Convert sequence with ambiguous characters to regular expression
RegExp
= seq2regexp(Seq
)
RegExp
= seq2regexp(Seq
,
...'Alphabet', AlphabetValue
, ...)
RegExp
= seq2regexp(Seq
,
...'Ambiguous', AmbiguousValue
, ...)
Seq | Either of the following:
|
AlphabetValue | Character vector or string specifying the sequence alphabet. Choices are:
|
AmbiguousValue | Controls whether ambiguous characters are included in
|
RegExp | Character vector of codes specifying an amino acid or nucleotide sequence in regular expression format using IUB/IUPAC codes. |
converts
ambiguous amino acid or nucleotide symbols in a sequence to a regular
expression format using IUB/IUPAC codes.RegExp
= seq2regexp(Seq
)
calls RegExp
= seq2regexp(Seq
,
...'PropertyName
', PropertyValue
,
...)seq2regexp
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
specifies
the sequence alphabet. RegExp
= seq2regexp(Seq
,
...'Alphabet', AlphabetValue
, ...)AlphabetValue
can
be either 'NT'
for nucleotide sequences or 'AA'
for
amino acid sequences. Default is 'NT'
.
controls
whether ambiguous characters are included in RegExp
= seq2regexp(Seq
,
...'Ambiguous', AmbiguousValue
, ...)RegExp
,
the regular expression return value. Choices are true
(default)
or false
. For example:
If Seq
= 'ACGTK'
,
and AmbiguousValue
is true
,
the MATLAB® software returns ACGT[GTK]
with
the unambiguous characters G
and T
and
the ambiguous character K
.
If Seq
= 'ACGTK'
,
and AmbiguousValue
is false
,
the MATLAB software returns ACGT[GT]
with
only the unambiguous characters.
Nucleotide Conversion
Nucleotide Code | Nucleotide | Conversion |
---|---|---|
A | Adenosine | A |
C | Cytosine | C |
G | Guanine | G |
T | Thymidine | T |
U | Uridine | U |
R | Purine | [AG] |
Y | Pyrimidine | [TC] |
K | Keto | [GT] |
M | Amino | [AC] |
S | Strong interaction (3 H bonds) | [GC] |
W | Weak interaction (2 H bonds) | [AT] |
B | Not A | [CGT] |
D | Not C | [AGT] |
H | Not G | [ACT] |
V | Not T or U | [ACG] |
N | Any nucleotide | [ACGT] |
- | Gap of indeterminate length | - |
? | Unknown | ? |
Amino Acid Conversion
Amino Acid Code | Amino Acid | Conversion |
---|---|---|
B | Asparagine or Aspartic acid (Aspartate) | [DN] |
Z | Glutamine or Glutamic acid (Glutamate) | [EQ] |
X | Any amino acid | [A R N D C Q E G H I L K M F
P S T W Y V] |
Convert a nucleotide sequence to a regular expression.
seq2regexp('ACWTMAN')
ans =
AC[ATW]T[ACM]A[ACGTRYKMSWBDHVN]
Convert the same nucleotide sequence, but remove ambiguous characters from the regular expression.
seq2regexp('ACWTMAN', 'ambiguous', false) ans = AC[AT]T[AC]A[ACGT]
regexp
| regexpi
| restrict
| seqwordcount