soapread

Read data from Short Oligonucleotide Analysis Package (SOAP) file

Syntax

SOAPStruct = soapread(File)
SOAPStruct = soapread(File,Name,Value)

Description

SOAPStruct = soapread(File) reads File, a SOAP-formatted file (version 2.15) and returns the data in SOAPStruct, a MATLAB® array of structures.

SOAPStruct = soapread(File,Name,Value) reads a SOAP-formatted file with additional options specified by one or more Name,Value pair arguments.

Input Arguments

File

Character vector or string specifying a file name, path and file name, or the text of a SOAP-formatted file. If you specify only a file name, that file must be on the MATLAB search path or in the Current Folder.

The soapread function reads SOAP-formatted files (version 2.15).

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'BlockRead'

Scalar or vector that controls the reading of a single sequence entry or block of sequence entries from a SOAP-formatted file containing multiple sequences. Enter a scalar N, to read the Nth entry in the file. Enter a 1-by-2 vector [M1, M2], to read a block of entries starting at the M1 entry and ending at the M2 entry. To read all remaining entries in the file starting at the M1 entry, enter a positive value for M1 and enter Inf for M2.

'AlignDetails'

Logical specifying whether or not to include the AlignDetails field in the SOAPStruct output argument. The AlignDetails field includes information on mismatches, insertions, and deletions in the alignment. Choices are true (default) or false.

Default: true

Output Arguments

SOAPStruct

An N-by-1 array of structures containing sequence alignment and mapping information from a SOAP-formatted file, where N is the number of alignment records stored in the SOAP-formatted file. Each structure contains the following fields.

FieldDescription
QueryName

Name of aligned read sequence.

SequenceCharacter vector containing the letter representations of the read sequence. It is the reverse-complement if the read sequence aligns to the reverse strand of the reference sequence.
QualityCharacter vector containing the ASCII representation of the per-base quality score for the read sequence. The quality score is reversed if the read sequence aligns to the reverse strand of the reference sequence.
NumHitsThe number of total instances where this read sequence aligned to an identical length of bases on another area of the reference sequence.
PairedEndSourceFileFlag (a or b) specifying which source file to which the read sequence belongs. This field applies only to read sequences that are paired in the alignment.
LengthScalar specifying the length of the read sequence.
Strand+ or − specifying direction (forward or reverse) of reference sequence to which the read sequence aligns.
ReferenceNameName or numeric ID of the reference sequence to which the read sequence aligns.
PositionPosition (one-based offset) of the forward reference sequence where the left-most base of the alignment of the read sequence starts.
AlignDetailsInformation on mismatches, insertions, and deletions in the alignment. For SOAP-formatted files v2.15, this field includes CIGAR strings.

Examples

Read the alignment records (entries) from the sample01.soap file into a MATLAB array of structures and access some of the data:

% Read the alignment records stored in the file sample01.soap
data = soapread('sample01.soap')
data = 

17x1 struct array with fields:
    QueryName
    Sequence
    Quality
    NumHits
    PairedEndSourceFile
    Length
    Strand
    ReferenceName
    Position
    AlignDetails
% Access the quality score for the 6th entry
data(6).Quality
ans =

<>.>>>8>;:1>>>3>6>
% Determine the strand direction (forward or reverse) of the reference
% sequence to which the 12th entry aligns
data(12).Strand
ans =

-

Read a block of alignment records (entries) from the sample01.soap file into a MATLAB array of structures:

% Read a block of six entries from a SOAP file
data_5_10 = soapread('sample01.soap','blockread', [5 10])
data_5_10 = 

6x1 struct array with fields:
    QueryName
    Sequence
    Quality
    NumHits
    PairedEndSourceFile
    Length
    Strand
    ReferenceName
    Position
    AlignDetails

Tips

If your SOAP-formatted file is too large to read using available memory, try either of the following:

  • Use the BlockRead name-value pair arguments to read a subset of entries.

  • Create a BioIndexedFile object from the SOAP-formatted file (using 'TABLE' for the Format), and then access the entries using methods of the BioIndexedFile class.

References

[1] Li, R., Yu, C., Li, Y., Lam, T., Yiu, S., Kristiansen, K., and Wang, J. (2009). SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 15, 1966–1967.

[2] Li, R., Li, Y., Kristiansen, K., and Wang, J. (2008). SOAP: short oligonucleotide alignment program. Bioinformatics 24(5), 713–714.

Introduced in R2010b