align2cigar

Convert aligned sequences to corresponding signatures in CIGAR format

Syntax

[Cigars,Starts] = align2cigar(Alignment,Ref)

Description

[Cigars,Starts] = align2cigar(Alignment,Ref) converts aligned sequences represented in Alignment, a cell array of aligned character vectors, string vector, or character array, into Cigars, a cell array of corresponding CIGAR–formatted character vectors or string vector, using the reference sequence specified by Ref, a character vector or string. It also returns Starts, a vector of integers indicating the start position of each aligned sequence with respect to the ungapped reference sequence.

Input Arguments

Alignment

Cell array of character vector, string vector, or a character array representing aligned sequences. Soft clippings are assumed to be represented by lowercase letters in the aligned sequences. Skipped positions are assumed to be represented by . in the aligned sequences.

Ref

Character vector or string specifying an aligned reference sequence. The length of Ref must equal the number of columns in Alignment.

Output Arguments

Cigars

Cell array of CIGAR-formatted character vectors or string vector corresponding to each aligned sequence in Alignment.

Starts

Vector of integers indicating the start position of each aligned sequence with respect to the ungapped reference sequence.

Examples

collapse all

This example shows how to convert aligned strings to CIGAR strings

Create a cell array of aligned strings, create a string specifying a reference sequence, and then convert the alignment to CIGAR strings:

aln = ['ACG-ATGC'; 'ACGT-TGC'; '  GTAT-C']
aln = 3x8 char array
    'ACG-ATGC'
    'ACGT-TGC'
    '  GTAT-C'

ref =  'ACGTATGC';
[cigar, start] = align2cigar(aln, ref)
cigar = 1x3 cell
    {'3=1D4='}    {'4=1D3='}    {'4=1D1='}

start = 1×3

     1     1     3

Introduced in R2010b