Retrieve subset of elements from object
uses additional options specified by one or more name-value pair arguments. For
example, you can specify whether to keep the data in memory. subset
= getSubset(object
,subset
,Name,Value
)
Store read data from a SAM-formatted file in a BioRead
object. By default, the data remains in the source file, and BioRead
uses an index file to access the data, making the process more memory efficient.
br = BioRead('ex1.sam')
br = BioRead with properties: Quality: [1501x1 File indexed property] Sequence: [1501x1 File indexed property] Header: [1501x1 File indexed property] NSeqs: 1501 Name: ''
Set the 'InMemory'
name-value pair argument to true
to store the data in memory, enabling you to access the data faster and edit the properties of the object.
brInMemory = BioRead('ex1.sam','InMemory',true)
brInMemory = BioRead with properties: Quality: {1501x1 cell} Sequence: {1501x1 cell} Header: {1501x1 cell} NSeqs: 1501 Name: ''
Retrieve the second and third elements from the object br
. By default, the resulting object subset
is not placed in memory if the parent object br
is not in memory. If br
is already in memory, the resulting subset is placed in memory.
subset = getSubset(br,[2 3])
subset = BioRead with properties: Quality: [2x1 File indexed property] Sequence: [2x1 File indexed property] Header: [2x1 File indexed property] NSeqs: 2 Name: ''
Alternatively, you can keep the parent object br
in the source file, and load the resulting subset in memory if the subset is small enough. You access the subset faster and update it as needed.
subsetInMemory = getSubset(br,[2 3],'InMemory',true)
subsetInMemory = BioRead with properties: Quality: {2x1 cell} Sequence: {2x1 cell} Header: {2x1 cell} NSeqs: 2 Name: ''
Update the header information of the first element.
subsetInMemory.Header(1)
ans = 1x1 cell array
{'EAS54_65:7:152:368:113'}
subsetInMemory.Header(1) = {'NewHeader'};
subsetInMemory.Header(1)
ans = 1x1 cell array
{'NewHeader'}
You can use a header to get the corresponding elements with that header. If multiple elements have the same header, the function returns all those elements.
Get all the elements with the header 'B7_591:4:96:693:509'
from the br
object stored in memory.
subset2 = getSubset(brInMemory,{'B7_591:4:96:693:509'})
subset2 = BioRead with properties: Quality: {'<<<<<<<<<<<<<<<;<<<<<<<<<5<<<<<;:<;7'} Sequence: {'CACTAGTGGCTCATTGTAAATGTGTGGTTTAACTCG'} Header: {'B7_591:4:96:693:509'} NSeqs: 1 Name: ''
subset
— Subset of elements in objectSubset of elements in the object, specified as a vector of positive integers, logical vector, string vector, or cell array of character vectors containing valid sequence headers.
Example: [1 3]
Tip
When you use a sequence header (or a cell array of headers) for subset
, a
repeated header specifies all elements with that header.
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'InMemory',true
specifies to save the output object
(subset
) in memory.'Name'
— Name of object''
(default) | character vector | stringName of the object, specified as the comma-separated pair consisting of
'Name'
and a character vector or string. The default is an empty
character vector ''
(no name).
Example: 'Name','newData'
'InMemory'
— Logical flag to keep data in memoryfalse
(default) | true
Logical flag to keep data in memory, specified as the comma-separated
pair consisting of 'InMemory'
and
true
or false
. Keeping the
data in memory lets you access the resulting object
subset
faster and update its properties. If the
data specified for subset
is still large and does
not fit in memory, set this name-value pair to false
to use indexed access, which is more memory efficient but does not
enable you to modify the properties.
If the parent object
is already in memory, the
resulting object subset
is automatically placed in
memory, and the function ignores this argument.
Example: 'InMemory',true
'SelectReference'
— References used to create subset of dataReferences used to create the subset
of data with
only the reads mapped to those references, specified as the
comma-separated pair consisting of 'SelectReference'
and a cell array of character vectors, string vector, or vector of
positive integers.
Note
This argument is for the BioMap
objects
only.
Example: 'SelectReference',{'RefSeq1'}
subset
— Subset of elementsBioRead
object | BioMap
objectSubset of elements from the object
, returned as a
BioRead
or BioMap
object. If
object
is in memory, then
subset
is placed in memory. If
object
is indexed, then subset
is indexed unless you set 'InMemory'
to
true
.
You have a modified version of this example. Do you want to open this example with your edits?