Convert unaligned sequences to aligned sequences using signatures in CIGAR format
Alignment = cigar2align(Seqs,Cigars)
[GapSeq, Indices]
= cigar2align(Seqs,Cigars)
... = cigar2align(Seqs,Cigars,Name,Value)
converts unaligned sequences in Alignment
= cigar2align(Seqs
,Cigars
)Seqs
, a cell array of character
vectors or string vector, into Alignment
, a matrix of aligned
sequences, using the information stored in Cigars
, a cell array of
CIGAR–formatted character vectors or string vector.
[
converts unaligned sequences in GapSeq
, Indices
]
= cigar2align(Seqs
,Cigars
)Seqs
, a cell array of character
vectors or string vector, into GapSeq
, a cell array of character
vectors of aligned sequences, and also returns Indices
, a vector of
numeric indices, using the information stored in Cigars
, a cell
array of CIGAR–formatted character vectors or string vector. When an alignment has many
columns, this syntax uses less memory and is faster.
... = cigar2align(
converts unaligned sequences in Seqs
,Cigars
,Name,Value
)Seqs
, a cell array of character
vectors or string vector, into Alignment
, a matrix of aligned
sequences, using the information stored in Cigars
, a cell array of
CIGAR–formatted character vectors or string vector, with additional options specified by
one or more Name,Value
pair arguments.
|
Cell array of character vectors or string vector containing unaligned sequences.
|
|
Cell array of valid CIGAR–formatted character vectors or string vector.
|
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
|
Vector of positive integers specifying the reference sequence position at which each aligned sequence starts. By default, each aligned sequence starts at position 1 of the reference sequence. |
|
Logical specifying whether to display positions in the aligned
sequences that correspond to gaps in the reference sequence. Choices
are Default: |
|
Logical specifying whether to include characters in the aligned
read sequences corresponding to soft clipping ends. Choices are Default: |
|
Logical specifying whether to add padding blanks to the left
of each aligned read sequence to represent the offset of the start
position from the first position of the reference sequence. Choices
are Default: |
|
Matrix of aligned sequences, in which the number of rows equals
the number of character vectors in |
|
Cell array of character vectors of aligned sequences, in which
the number character vectors equals the number of character vectors
in |
|
Vector of numeric indices indicating the starting column for
each aligned sequence in
|
Create a cell array of character vectors containing unaligned
sequences, create a cell array of corresponding CIGAR–formatted
character vectors associated with a reference sequence of ACGTATGC
,
and then reconstruct the alignment:
r = {'ACGACTGC', 'ACGTTGC', 'AGGTATC'}; % unaligned sequences c = {'3M1D1M1I3M', '4M1D1P3M', '5M1P1M1D1M'}; % cigar-formatted aln1 = cigar2align(r, c)
aln1 = ACG-ATGC ACGT-TGC AGGTAT-C
Reconstruct the same alignment to display positions in the aligned sequences that correspond to gaps in the reference sequence:
aln2 = cigar2align(r, c,'GapsInRef',true)
aln2 = ACG-ACTGC ACGT--TGC AGGTA-T-C
Reconstruct the alignment adding an offset padding of 5
:
aln3 = cigar2align(r, c, 'start', [5 5 5], 'OffsetPad', true)
aln3 = ACG-ATGC ACGT-TGC AGGTAT-C
When cigar2align
reconstructs the alignment,
it does not display hard clipped positions (H) or soft clipped positions
(S). Also, it does not consider soft clipped positions as start positions
for aligned sequences.
If your CIGAR information is captured in the Signature
property
of a BioMap
object,
you can use the getAlignment
method to construct the alignment.
align2cigar
| BioMap
| getAlignment
| getBaseCoverage
| getCompactAlignment
| seqalignviewer