cuffgffread

Filter and convert GFF and GTF files

Syntax

cuffgffread(input,output)

cuffgffread(input,output,opt)

cuffgffread(input,output,Name,Value)

Description

cuffgffread(input,output) reads the input GFF or GTF file and writes the mandatory columns to the output GFF file [1]. The function can also return the GTF-format file using the 'GTFOutput' option.

cuffgffread requires the Cufflinks Support Package for the Bioinformatics Toolbox™. If the support package is not installed, then the function provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

Note

cuffgffread is supported on the Mac and UNIX^® platforms only.

cuffgffread(input,output,opt) uses the additional options specified by opt.

cuffgffread(input,output,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, cuffgffread('gyrAB.gtf','gyrAB.gff','PreserveAttributes',true) retains all attributes in the output file.

Examples

collapse all

Convert GTF to GFF Format

Convert a GTF file to a GFF file while retaining all attributes.

cuffgffread('gyrAB.gtf','gyrABOut.gff','PreserveAttributes',true)

You can also set the options using an object. For instance, specify the output to be in the GTF format.

opt = CuffGFFReadOptions;
opt.GTFOutput = true;
opt.PreserveAttributes = true;
cuffgffread('gyrAB.gtf','gyrABOut.gtf',opt);

Once you have the options object, you can retrieve the equivalent original options for all object properties using getOptionsTable.

getOptionsTable(opt)

ans =

  33×3 table

                                        PropertyName                FlagName        FlagShortName
                                 ___________________________    ________________    _____________

    AppendDescription            'AppendDescription'            '-A'                    ''       
    CheckOppositeStrand          'CheckOppositeStrand'          '-B'                    ''       
    CheckPhase                   'CheckPhase'                   '-H'                    ''       
    Cluster                      'Cluster'                      '--cluster-only'        ''       
    CodingOnly                   'CodingOnly'                   '-C'                    ''       
    CollapseContainer            'CollapseContainer'            '-K'                    ''       
    CollapseFull                 'CollapseFull'                 '-Q'                    ''       
    CoordinateRange              'CoordinateRange'              '-r'                    ''       
    DiscardInvalidCDS            'DiscardInvalidCDS'            '-J'                    ''       
    DiscardNonCanonicalSplice    'DiscardNonCanonicalSplice'    '-N'                    ''       
    DiscardSingleExon            'DiscardSingleExon'            '-U'                    ''       
    DiscardTerminatedCDS         'DiscardTerminatedCDS'         '-V'                    ''       
    FastaCDSFile                 'FastaCDSFile'                 '-x'                    ''       
    FastaExonsFile               'FastaExonsFile'               '-w'                    ''       
    FastaProteinFile             'FastaProteinFile'             '-y'                    ''       
    FirstExonOnly                'FirstExonOnly'                '-G'                    ''       
    ForceExons                   'ForceExons'                   '--force-exons'         ''       
    FullyContained               'FullyContained'               '-R'                    ''       
    GTFOutput                    'GTFOutput'                    '-T'                    ''       
    MaxIntronLength              'MaxIntronLength'              '-i'                    ''       
    Merge                        'Merge'                        '--merge'               '-M'     
    MergeCloseExons              'MergeCloseExons'              '-Z'                    ''       
    MergeInfoFile                'MergeInfoFile'                '-d'                    ''       
    PreserveAttributes           'PreserveAttributes'           '-F'                    ''       
    Pseudo                       'Pseudo'                       '--no-pseudo'           ''       
    ReplacementTable             'ReplacementTable'             '-m'                    ''       
    SequenceFile                 'SequenceFile'                 '-g'                    ''       
    SequenceInfo                 'SequenceInfo'                 '-s'                    ''       
    UrlDecode                    'UrlDecode'                    '-D'                    ''       
    UseEnsemblConversion         'UseEnsemblConversion'         '-L'                    ''       
    UseNonTranscript             'UseNonTranscript'             '-O'                    ''       
    UseTrackName                 'UseTrackName'                 '-t'                    ''       
    WriteCoordinates             'WriteCoordinates'             '-W'                    ''

Input Arguments

collapse all

`input` — Input file name
string | character vector

Input file name, specified as a string or character vector. The file can be a GTF or GFF file.

Example: 'gyrAB.gtf'

Data Types: char | string

`output` — Output file name
string | character vector

Output file name, specified as a string or character vector. By default, the output is a GFF file. Set 'GTFOutput' to true to get a GTF output file.

Example: 'gyrAB.gff'

Data Types: char | string

`opt` — `cuffgffread` options
`CuffGFFReadOptions` object | string | character vector

cuffgffread options, specified as a CuffGFFReadOptions object, string, or character vector. The string or character vector must be in the original gffread option syntax (prefixed by one or two dashes) [1].

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: cuffgffread('gyrAB.gtf','gyrAB.gff','CoordinateRange','+NC_000912.1:4821..7340')

`'AppendDescription'` — Flag to add file descriptions to `descr` attribute
`false` (default) | `true`

Flag to add file descriptions from sequence files to the descr attribute of the output GFF record, specified as true or false. Specify the sequence files using the SequenceInfo option.

Example: 'AppendDescription',true

Data Types: logical

`'CheckOppositeStrand'` — Flag to check opposite strand when checking for in-frame stop codons
`false` (default) | `true`

Flag to check opposite strand when checking for in-frame stop codons, specified as true or false.

Example: 'CheckOppositeStrand',true

Data Types: logical

`'CheckPhase'` — Flag to adjust coding sequence phase
`false` (default) | `true`

Flag to adjust coding sequence phase when checking for in-frame stop codons, specified as true or false.

Example: 'CheckPhase',true

Data Types: logical

`'Cluster'` — Flag to cluster input transcripts into loci
`true` (default) | `false`

Flag to cluster the input transcripts into loci, specified as true or false. This option is the same as the Merge property, except that it does not collapse fully contained transcripts with identical introns.

Example: 'Cluster',false

Data Types: logical

`'CodingOnly'` — Flag to discard transcripts with no coding sequence
`false` (default) | `true`

Flag to discard transcripts with no coding sequence feature (CDS), specified as true or false.

Example: 'CodingOnly',true

Data Types: logical

`'CollapseContainer'` — Flag to collapse fully contained transcripts
`false` (default) | `true`

Flag to collapse fully contained transcripts that are shorter with fewer introns than the container, specified as true or false. This property applies only when you set Merge to true.

Example: 'CollapseContainer',true

Data Types: logical

`'CollapseFull'` — Flag to collapse shorter transcripts overlapping at least 80% with another exon
`false` (default) | `true`

Flag to collapse shorter transcripts overlapping at least 80% with another single exon transcript, specified as true or false. This property applies only when you set Merge to true.

Example: 'CollapseFull',true

Data Types: logical

`'CoordinateRange'` — Genomic range to filter transcripts
string | character vector

Genomic range to filter transcripts, specified as a string or character vector. The format must be "[[<strand>]<chr>:]<start>..<end>", where start and end are genomic positions, chr is an optional chromosome or contig name, and an optional strand ('+' or '-').

Example: 'CoordinateRange',“+NC_000912.1:4821..7340”

Data Types: char | string

`'DiscardInvalidCDS'` — Flag to ignore mRNA transcripts either lacking start or stop codon or having in-frame stop codon
`false` (default) | `true`

Flag to ignore mRNA transcripts either lacking a start or stop codon or having an in-frame stop codon, specified as true or false.

Example: 'DiscardInvalidCDS',true

Data Types: logical

`'DiscardNonCanonicalSplice'` — Flag to ignore multiexon mRNA transcripts that have intron with noncanonical splice sequence
`false` (default) | `true`

Flag to ignore multiexon mRNA transcripts that have an intron with a noncanonical splice sequence, specified as true or false. A noncanonical splice sequence is any splice sequence other than "GT-AG", "CG-AG", or "AT-AC".

Example: 'DiscardNonCanonicalSplice',true

Data Types: logical

`'DiscardSingleExon'` — Flag to ignore transcripts spanning single exon
`false` (default) | `true`

Flag to ignore transcripts spanning a single exon, specified as true or false.

Example: 'DiscardSingleExon',true

Data Types: logical

`'DiscardTerminatedCDS'` — Flag to ignore transcripts with in-frame stop codon
`false` (default) | `true`

Flag to ignore transcripts with an in-frame stop codon, specified as true or false.

Example: 'DiscardTerminatedCDS',true

Data Types: logical

`'ExtraCommand'` — Additional commands
`""` (default) | character vector | string

The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB properties.

Example: 'ExtraCommand',"-E"

Data Types: char | string

`'FastaCDSFile'` — Name of file to save spliced coding sequences
string | character vector

Name of a file to save the spliced coding sequences in the FASTA format, specified as a string or character vector.

Example: 'FastaCDSFile',"splicedCoding.FASTA"

Data Types: char | string

`'FastaExonsFile'` — Name of file to save spliced exons
string | character vector

Name of a file to save the spliced exons in the FASTA format, specified as a string or character vector.

Example: 'FastaExonsFile',"splicedExon.FASTA"

Data Types: char | string

`'FastaProteinFile'` — Name of file to save protein translation of coding sequences
string | character vector

Name of a file to save the protein translation of coding sequences in the FASTA format, specified as a string or character vector.

Example: 'FastaProteinFile',"translated.FASTA"

Data Types: char | string

`'FirstExonOnly'` — Flag to parse additional attributes only from first exon
`false` (default) | `true`

Flag to parse additional attributes only from the first exon, specified as true or false.

Example: 'FirstExonOnly',true

Data Types: logical

`'ForceExons'` — Flag to list lowest-level GFF features as exon features
`false` (default) | `true`

Flag to list the lowest-level GFF features as exon features in the output file, specified as true or false.

Example: 'ForceExons',true

Data Types: logical

`'FullyContained'` — Flag to discard transcripts not contained fully
`false` (default) | `true`

Flag to discard transcripts not contained fully within the range, specified as true or false. Specify the range using the CoordinateRange option.

Example: 'FullyContained',true

Data Types: logical

`'GTFOutput'` — Flag to output GTF-format transcript files
`false` (default) | `true`

Flag to output GTF-format transcript files, specified as true or false.

Example: 'GTFOutput',true

Data Types: logical

`'IncludeAll'` — Flag to apply all available options
`false` (default) | `true`

The original (native) syntax is prefixed by one or two dashes. By default, the function converts only the specified options. If the value is true, the software converts all available options, with default values for unspecified options, to the original syntax.

Note

If you set IncludeAll to true, the software translates all available properties, with default values for unspecified properties. The only exception is that when the default value of a property is NaN, Inf, [], '', or "", then the software does not translate the corresponding property.

Example: 'IncludeAll',true

Data Types: logical

`'MaxIntronLength'` — Maximum intron length for transcript to include in output
`Inf` (default) | positive integer

Maximum intron length for a transcript to include in the output file, specified as a positive integer. Inf, the default value, sets no limit on the intron length.

Example: 'MaxIntronLength',500

Data Types: double

`'Merge'` — Flag to merge transcripts to loci
`false` (default) | `true`

Flag to merge transcripts into loci by collapsing transcripts with identical introns, specified as true or false.

Example: 'Merge',true

Data Types: logical

`'MergeCloseExons'` — Flag to merge exons into single exon
`false` (default) | `true`

Flag to merge exons into a single exon when separated by fewer than 4 base-pair introns, specified as true or false.

Example: 'MergeCloseExons',true

Data Types: logical

`'MergeInfoFile'` — Name of file to save information on duplicates when merging
string | character vector

Name of a file to save information on duplicates when merging, specified as a string or character vector. This property applies only when you set Merge to true.

Example: 'MergeInfoFile',"duplicates.txt"

Data Types: char | string

`'PreserveAttributes'` — Flag to retain all attributes in output
`false` (default) | `true`

Flag to retain all attributes in the output file, specified as true or false.

Example: 'PreserveAttributes',true

Data Types: logical

`'Pseudo'` — Flag to filter out records containing "pseudo"
`true` (default) | `false`

Flag to filter out records containing the word "pseudo," specified as true or false.

Example: 'Pseudo',false

Data Types: logical

`'ReplacementTable'` — Name of file containing replacement table
string | character vector

Name of a file containing a replacement table, specified as a string or character vector. The table must have two columns, where the first column contains the original transcript IDs and the second column contains the new transcript IDs. An example table follows.

origTranscript1	newTranscript1
origTranscript2	newTranscript2
origTranscript3	newTranscript3

If you provide a replacement table, the function replaces the transcript IDs found in the first column with the new transcripts IDs from the second column and filters out those transcripts not found.

Example: 'ReplacementTable',"replaceTbl.txt"

Data Types: char | string

`'SequenceFile'` — Name of FASTA-format file containing genomic sequences
string | character vector

Name of a FASTA-format file containing genomic sequences for all input mappings, specified as a string or character vector.

Example: 'SequenceFile',"seqs.fasta"

Data Types: char | string

`'SequenceInfo'` — Name of tab-delimited file with additional information on input sequence
string | character vector

Name of a tab-delimited file with additional information on each input sequence, specified as a string or character vector. This file must have three columns: a sequence name column, a sequence length column, and a sequence description column. If AppendDescription is true, the sequence description is included as an attribute in the output GFF file.

Example: 'SequenceInfo',"seqinfo.txt"

Data Types: char | string

`'UrlDecode'` — Flag to decode URL-encoded characters in attribute names
`false` (default) | `true`

Flag to decode url-encoded characters in attribute names, specified as true or false. For instance, "transcript%20description" is decoded to "transcript description".

Example: 'UrlDecode',true

Data Types: logical

`'UseEnsemblConversion'` — Flag to use GTF-to-GFF3 conversion method from Ensembl
`false` (default) | `true`

Flag to use the GTF-to-GFF3 conversion method from Ensembl, specified as true or false.

Example: 'UseEnsemblConversion',true

Data Types: logical

`'UseNonTranscript'` — Flag to include nontranscript GFF records in output file
`false` (default) | `true`

Flag to include nontranscript GFF records in the output file, specified as true or false.

Example: 'UseNonTranscript',true

Data Types: logical

`'UseTrackName'` — Flag to use track name in second column of GFF output line
`false` (default) | `true`

Flag to use the track name in the second column of the GFF output line, specified as true or false.

Example: 'UseTrackName',true

Data Types: logical

`'WriteCoordinates'` — Flag to write exon coordinates projected onto spliced sequence
`false` (default) | `true`

Flag to write the exon coordinates projected onto the spliced sequence, specified as true or false. This property applies only when FastaExonsFile or FastaCDSFile is specified.

Example: 'WriteCoordinates',true

Data Types: logical

References

[1] Trapnell, Cole, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. “Transcript Assembly and Quantification by RNA-Seq Reveals Unannotated Transcripts and Isoform Switching during Cell Differentiation.” Nature Biotechnology 28, no. 5 (May 2010): 511–15.

Documentation

cuffgffread

Syntax

Description

Examples

Convert GTF to GFF Format

Input Arguments

input — Input file name string | character vector

output — Output file name string | character vector

opt — cuffgffread options CuffGFFReadOptions object | string | character vector

Name-Value Pair Arguments

'AppendDescription' — Flag to add file descriptions to descr attribute false (default) | true

'CheckOppositeStrand' — Flag to check opposite strand when checking for in-frame stop codons false (default) | true

'CheckPhase' — Flag to adjust coding sequence phase false (default) | true

'Cluster' — Flag to cluster input transcripts into loci true (default) | false

'CodingOnly' — Flag to discard transcripts with no coding sequence false (default) | true

'CollapseContainer' — Flag to collapse fully contained transcripts false (default) | true

'CollapseFull' — Flag to collapse shorter transcripts overlapping at least 80% with another exon false (default) | true

'CoordinateRange' — Genomic range to filter transcripts string | character vector

'DiscardInvalidCDS' — Flag to ignore mRNA transcripts either lacking start or stop codon or having in-frame stop codon false (default) | true

'DiscardNonCanonicalSplice' — Flag to ignore multiexon mRNA transcripts that have intron with noncanonical splice sequence false (default) | true

'DiscardSingleExon' — Flag to ignore transcripts spanning single exon false (default) | true

'DiscardTerminatedCDS' — Flag to ignore transcripts with in-frame stop codon false (default) | true

'ExtraCommand' — Additional commands "" (default) | character vector | string

'FastaCDSFile' — Name of file to save spliced coding sequences string | character vector

'FastaExonsFile' — Name of file to save spliced exons string | character vector

'FastaProteinFile' — Name of file to save protein translation of coding sequences string | character vector

'FirstExonOnly' — Flag to parse additional attributes only from first exon false (default) | true

'ForceExons' — Flag to list lowest-level GFF features as exon features false (default) | true

'FullyContained' — Flag to discard transcripts not contained fully false (default) | true

'GTFOutput' — Flag to output GTF-format transcript files false (default) | true

'IncludeAll' — Flag to apply all available options false (default) | true

'MaxIntronLength' — Maximum intron length for transcript to include in output Inf (default) | positive integer

'Merge' — Flag to merge transcripts to loci false (default) | true

'MergeCloseExons' — Flag to merge exons into single exon false (default) | true

'MergeInfoFile' — Name of file to save information on duplicates when merging string | character vector

'PreserveAttributes' — Flag to retain all attributes in output false (default) | true

'Pseudo' — Flag to filter out records containing "pseudo" true (default) | false

'ReplacementTable' — Name of file containing replacement table string | character vector

'SequenceFile' — Name of FASTA-format file containing genomic sequences string | character vector

'SequenceInfo' — Name of tab-delimited file with additional information on input sequence string | character vector

'UrlDecode' — Flag to decode URL-encoded characters in attribute names false (default) | true

'UseEnsemblConversion' — Flag to use GTF-to-GFF3 conversion method from Ensembl false (default) | true

'UseNonTranscript' — Flag to include nontranscript GFF records in output file false (default) | true

'UseTrackName' — Flag to use track name in second column of GFF output line false (default) | true

'WriteCoordinates' — Flag to write exon coordinates projected onto spliced sequence false (default) | true

References

See Also

Topics

External Websites

Bioinformatics Toolbox Documentation

Support

`input` — Input file name
string | character vector

`output` — Output file name
string | character vector

`opt` — `cuffgffread` options
`CuffGFFReadOptions` object | string | character vector

`'AppendDescription'` — Flag to add file descriptions to `descr` attribute
`false` (default) | `true`

`'CheckOppositeStrand'` — Flag to check opposite strand when checking for in-frame stop codons
`false` (default) | `true`

`'CheckPhase'` — Flag to adjust coding sequence phase
`false` (default) | `true`

`'Cluster'` — Flag to cluster input transcripts into loci
`true` (default) | `false`

`'CodingOnly'` — Flag to discard transcripts with no coding sequence
`false` (default) | `true`

`'CollapseContainer'` — Flag to collapse fully contained transcripts
`false` (default) | `true`

`'CollapseFull'` — Flag to collapse shorter transcripts overlapping at least 80% with another exon
`false` (default) | `true`

`'CoordinateRange'` — Genomic range to filter transcripts
string | character vector

`'DiscardInvalidCDS'` — Flag to ignore mRNA transcripts either lacking start or stop codon or having in-frame stop codon
`false` (default) | `true`

`'DiscardNonCanonicalSplice'` — Flag to ignore multiexon mRNA transcripts that have intron with noncanonical splice sequence
`false` (default) | `true`

`'DiscardSingleExon'` — Flag to ignore transcripts spanning single exon
`false` (default) | `true`

`'DiscardTerminatedCDS'` — Flag to ignore transcripts with in-frame stop codon
`false` (default) | `true`

`'ExtraCommand'` — Additional commands
`""` (default) | character vector | string

`'FastaCDSFile'` — Name of file to save spliced coding sequences
string | character vector

`'FastaExonsFile'` — Name of file to save spliced exons
string | character vector

`'FastaProteinFile'` — Name of file to save protein translation of coding sequences
string | character vector

`'FirstExonOnly'` — Flag to parse additional attributes only from first exon
`false` (default) | `true`

`'ForceExons'` — Flag to list lowest-level GFF features as exon features
`false` (default) | `true`

`'FullyContained'` — Flag to discard transcripts not contained fully
`false` (default) | `true`

`'GTFOutput'` — Flag to output GTF-format transcript files
`false` (default) | `true`

`'IncludeAll'` — Flag to apply all available options
`false` (default) | `true`

`'MaxIntronLength'` — Maximum intron length for transcript to include in output
`Inf` (default) | positive integer

`'Merge'` — Flag to merge transcripts to loci
`false` (default) | `true`

`'MergeCloseExons'` — Flag to merge exons into single exon
`false` (default) | `true`

`'MergeInfoFile'` — Name of file to save information on duplicates when merging
string | character vector

`'PreserveAttributes'` — Flag to retain all attributes in output
`false` (default) | `true`

`'Pseudo'` — Flag to filter out records containing "pseudo"
`true` (default) | `false`

`'ReplacementTable'` — Name of file containing replacement table
string | character vector

`'SequenceFile'` — Name of FASTA-format file containing genomic sequences
string | character vector

`'SequenceInfo'` — Name of tab-delimited file with additional information on input sequence
string | character vector

`'UrlDecode'` — Flag to decode URL-encoded characters in attribute names
`false` (default) | `true`

`'UseEnsemblConversion'` — Flag to use GTF-to-GFF3 conversion method from Ensembl
`false` (default) | `true`

`'UseNonTranscript'` — Flag to include nontranscript GFF records in output file
`false` (default) | `true`

`'UseTrackName'` — Flag to use track name in second column of GFF output line
`false` (default) | `true`

`'WriteCoordinates'` — Flag to write exon coordinates projected onto spliced sequence
`false` (default) | `true`