getSubset

Class: GTFAnnotation

Create object containing subset of elements from GTFAnnotation object

Syntax

NewObj = getSubset(AnnotObj,StartPos,EndPos)
NewObj = getSubset(AnnotObj,Subset)
NewObj = getSubset(___,Name,Value)

Description

NewObj = getSubset(AnnotObj,StartPos,EndPos) returns NewObj, a new object containing a subset of the elements from AnnotObj that falls within each reference sequence range specified by StartPos and EndPos.

NewObj = getSubset(AnnotObj,Subset) returns NewObj, a new object containing a subset of elements specified by Subset, a vector of integers.

NewObj = getSubset(___,Name,Value) returns NewObj, a new object containing a subset of the elements from AnnotObj, using any of the input arguments from the previous syntaxes and additional options specified by one or more Name,Value pair arguments.

Input Arguments

AnnotObj

Object of the GTFAnnotation class.

StartPos

Nonnegative integer specifying the start of a range in each reference sequence in AnnotObj. The integer StartPos must be less than or equal to EndPos.

EndPos

Nonnegative integer specifying the end of a range in each reference sequence in AnnotObj. The integer EndPos must be greater than or equal to StartPos.

Subset

Vector of positive integers less than or equal to the number of entries in the object. Use the vector Subset to retrieve any element or subset of the object.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Reference'

Character vector, string, string vector, or cell array of character vectors specifying one or more reference sequences in AnnotObj. Only annotations whose reference field matches one of the character vectors or strings are included in NewObj.

'Feature'

Character vector, string, string vector, or cell array of character vectors specifying one or more features in AnnotObj. Only annotations whose feature field matches one of the character vectors or strings are included in NewObj.

'Gene'

Character vector, string, string vector, or cell array of character vectors specifying one or more genes in AnnotObj. Only annotations whose gene field matches one of the character vectors or strings are included in NewObj.

'Transcript'

Character vector, string, string vector, or cell array of character vectors specifying one or more transcripts in AnnotObj. Only annotations whose transcript field matches one of the character vectors or strings are included in NewObj.

'Overlap'

Minimum number of base positions that an annotation must overlap in the range, to be included in NewObj. This value can be any of the following:

  • Positive integer

  • 'full' — An annotation must be fully contained in the range to be included.

  • 'start' — An annotation’s start position must lie within the range to be included.

Default: 1

Output Arguments

NewObj

Object of the GTFAnnotation class.

Examples

Example 34. Create a Subset of Data Containing Only CDS Features from a GTF-formatted File

Construct a GTFAnnotation object using a GTF-formatted file that is provided with Bioinformatics Toolbox™.

GTFAnnotObj = GTFAnnotation('hum37_2_1M.gtf');

Create a subset of the data containing only CDS features.

subsetGTF = getSubset(GTFAnnotObj,'Feature','CDS')
subsetGTF = 

  GTFAnnotation with properties:

    FieldNames: {1x11 cell}
    NumEntries: 92
Example 35. Retrieve Subsets of Data from a GTFAnnotation Object

Construct a GTFAnnotation object using a GTF-formatted file that is provided with Bioinformatics Toolbox.

GTFAnnotObj = GTFAnnotation('hum37_2_1M.gtf');

Retrieve a subset of data from the first to fifth elements of GTFAnnotObj.

subsetGTF1 = getSubset(GTFAnnotObj,[1:5])

subsetGTF1 = 

  GTFAnnotation with properties:

    FieldNames: {1x11 cell}
    NumEntries: 5

Retrieve only the first, fifth and eighth elements of GTFAnnotObj.

subsetGTF2 = getSubset(GTFAnnotObj,[1 5 8])

subsetGTF2 = 

  GTFAnnotation with properties:

    FieldNames: {1x11 cell}
    NumEntries: 3

Tips

  • The getSubset method selects annotations from the range specified by StartPos and EndPos for each reference sequence in AnnotObj unless you use the 'Reference' name-value pair argument to limit the reference sequences.

  • After creating a subsetted object, you can access the number of entries, range of reference sequences covered by annotations, field names, and reference names. To access the values of all fields, create a structure of the data using the getData method.