getData

Class: GTFAnnotation

Create structure containing subset of data from GTFAnnotation object

Syntax

AnnotStruct = getData(AnnotObj)
AnnotStruct = getData(AnnotObj,StartPos,EndPos)
AnnotStruct = getData(AnnotObj,Subset)
AnnotStruct = getData(___,Name,Value)

Description

AnnotStruct = getData(AnnotObj) returns AnnotStruct, an array of structures containing data from all elements in AnnotObj. The fields in the return structures are the same as the elements in the FieldNames property of AnnotObj.

AnnotStruct = getData(AnnotObj,StartPos,EndPos) returns AnnotStruct, an array of structures containing data from a subset of the elements in AnnotObj that falls within each reference sequence range specified by StartPos and EndPos.

AnnotStruct = getData(AnnotObj,Subset) returns AnnotStruct, an array of structures containing subset of data from AnnotObj specified by Subset, a vector of integers.

AnnotStruct = getData(___,Name,Value) returns AnnotStruct, an array of structures, using any of the input arguments from the previous syntaxes and additional options specified by one or more Name,Value pair arguments.

Input Arguments

AnnotObj

Object of the GTFAnnotation class.

StartPos

Nonnegative integer specifying the start of a range in each reference sequence in AnnotObj. The integer StartPos must be less than or equal to EndPos.

EndPos

Nonnegative integer specifying the end of a range in each reference sequence in AnnotObj. The integer EndPos must be greater than or equal to StartPos.

Subset

Vector of positive integers equal or less than the number of entries in the object. Use the vector Subset to retrieve any element or subset of data from the object.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Reference'

Character vector, string, string vector, or cell array of character vectors specifying one or more reference sequences in AnnotObj. Only annotations whose reference field matches one of the character vectors or strings are included in AnnotStruct.

'Feature'

Character vector, string, string vector, or cell array of character vectors specifying one or more features in AnnotObj. Only annotations whose feature field matches one of the character vectors or strings are included in AnnotStruct.

'Gene'

Character vector, string, string vector, or cell array of character vectors specifying one or more genes in AnnotObj. Only annotations whose gene field matches one of the character vectors or strings are included in AnnotStruct.

'Transcript'

Character vector, string, string vector, or cell array of character vectors specifying one or more transcripts in AnnotObj. Only annotations whose transcript field matches one of the character vectors or strings are included in AnnotStruct.

'Overlap'

Minimum number of base positions that an annotation must overlap in the range, to be included in AnnotStruct. This value can be any of the following:

  • Positive integer

  • 'full' — An annotation must be fully contained in the range to be included.

  • 'start' — An annotation’s start position must lie within the range to be included.

Default: 1

Output Arguments

AnnotStruct

Array of structures containing data from elements in AnnotObj. The fields in the return structures are the same as the elements in the FieldNames property of AnnotObj, and specified by GTF2.2: A Gene Annotation Format. Specifically, these fields are:

  • Reference

  • Start

  • Stop

  • Feature

  • Gene

  • Transcript

  • Source

  • Score

  • Strand

  • Frame

  • Attributes

Examples

Example 25. Retrieve Subsets of Data from a GTFAnnotation Object

Construct a GTFAnnotation object using a GTF-formatted file that is provided with Bioinformatics Toolbox™.

GTFAnnotObj = GTFAnnotation('hum37_2_1M.gtf');

Extract the annotation data for positions 668,000 through 680,000 from the reference sequence.

AnnotStruct1 = getData(GTFAnnotObj,668000,680000)
AnnotStruct1 = 

18x1 struct array with fields:
    Reference
    Start
    Stop
    Feature
    Gene
    Transcript
    Source
    Score
    Strand
    Frame
    Attributes

Extract the first five annotations from the object.

AnnotStruct2 = getData(GTFAnnotObj,[1:5])

AnnotStruct2 = 

5x1 struct array with fields:
    Reference
    Start
    Stop
    Feature
    Gene
    Transcript
    Source
    Score
    Strand
    Frame
    Attributes

Tips

Using getdata creates a structure, which provides better access to the annotation data than an object.

  • You can access all field values in a structure.

  • You can not only extract field values, but also assign and delete values.

  • You can use linear indexing to access field values of specific annotations. For example, you can access the start value of only the fifth annotation.