cufflinks

Assemble transcriptome from aligned reads

collapse all in page

Syntax

cufflinks(alignmentFiles)

cufflinks(alignmentFiles,cufflinksOptions)

cufflinks(alignmentFiles,Name,Value)

[transcripts,isoforms,genes,skippedTranscripts] = cufflinks(___)

Description

example

cufflinks(alignmentFiles) assembles a transcriptome from aligned reads in alignmentFile and quantifies the level of expression for each transcript [1]. By default, the function writes the results to a GTF file named transcripts.gtf in the current directory.

cufflinks requires the Cufflinks Support Package for the Bioinformatics Toolbox™. If the support package is not installed, then the function provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

Note

cufflinks is supported on the Mac and UNIX^® platforms only.

cufflinks(alignmentFiles,cufflinksOptions) uses additional options specified by cufflinksOptions.

cufflinks(alignmentFiles,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, cufflinks(alignmentFile,'TrimCoverageThreshold',5) specifies the minimum average coverage for 3' end trimming.

[transcripts,isoforms,genes,skippedTranscripts] = cufflinks(___) returns the file names of the assembled transcriptome using any of the input argument combinations from the previous syntaxes. By default, the function saves all files to the current directory.

Examples

collapse all

Assemble Transcriptome and Perform Differential Expression Testing

Create a CufflinksOptions object to define cufflinks options, such as the number of parallel threads and the output directory to store the results.

cflOpt = CufflinksOptions;
cflOpt.NumThreads = 8;
cflOpt.OutputDirectory = "./cufflinksOut";

The SAM files provided for this example contain aligned reads for Mycoplasma pneumoniae from two samples with three replicates each. The reads are simulated 100bp-reads for two genes (gyrA and gyrB) located next to each other on the genome. All the reads are sorted by reference position, as required by cufflinks.

sams = ["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam",...
        "Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"];

Assemble the transcriptome from the aligned reads.

[gtfs,isofpkm,genes,skipped] = cufflinks(sams,cflOpt);

gtfs is a list of GTF files that contain assembled isoforms.

Compare the assembled isoforms using cuffcompare.

stats = cuffcompare(gtfs);

Merge the assembled transcripts using cuffmerge.

mergedGTF = cuffmerge(gtfs,'OutputDirectory','./cuffMergeOutput');

mergedGTF reports only one transcript. This is because the two genes of interest are located next to each other, and cuffmerge cannot distinguish two distinct genes. To guide cuffmerge, use a reference GTF (gyrAB.gtf) containing information about these two genes. If the file is not located in the same directory that you run cuffmerge from, you must also specify the file path.

gyrAB = which('gyrAB.gtf');
mergedGTF2 = cuffmerge(gtfs,'OutputDirectory','./cuffMergeOutput2',...
			'ReferenceGTF',gyrAB);

Calculate abundances (expression levels) from aligned reads for each sample.

abundances1 = cuffquant(mergedGTF2,["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam"],...
                        'OutputDirectory','./cuffquantOutput1');
abundances2 = cuffquant(mergedGTF2,["Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"],...
                        'OutputDirectory','./cuffquantOutput2');

Assess the significance of changes in expression for genes and transcripts between conditions by performing the differential testing using cuffdiff. The cuffdiff function operates in two distinct steps: the function first estimates abundances from aligned reads, and then performs the statistical analysis. In some cases (for example, distributing computing load across multiple workers), performing the two steps separately is desirable. After performing the first step with cuffquant, you can then use the binary CXB output file as an input to cuffdiff to perform statistical analysis. Because cuffdiff returns several files, specify the output directory is recommended.

isoformDiff = cuffdiff(mergedGTF2,[abundances1,abundances2],...
                      'OutputDirectory','./cuffdiffOutput');

Display a table containing the differential expression test results for the two genes gyrB and gyrA.

readtable(isoformDiff,'FileType','text')

ans =

  2×14 table

        test_id            gene_id        gene              locus             sample_1    sample_2    status     value_1       value_2      log2_fold_change_    test_stat    p_value    q_value    significant
    ________________    _____________    ______    _______________________    ________    ________    ______    __________    __________    _________________    _________    _______    _______    ___________

    'TCONS_00000001'    'XLOC_000001'    'gyrB'    'NC_000912.1:2868-7340'      'q1'        'q2'       'OK'     1.0913e+05    4.2228e+05          1.9522           7.8886      5e-05      5e-05        'yes'   
    'TCONS_00000002'    'XLOC_000001'    'gyrA'    'NC_000912.1:2868-7340'      'q1'        'q2'       'OK'     3.5158e+05    1.1546e+05         -1.6064          -7.3811      5e-05      5e-05        'yes'

You can use cuffnorm to generate normalized expression tables for further analyses. cuffnorm results are useful when you have many samples and you want to cluster them or plot expression levels for genes that are important in your study. Note that you cannot perform differential expression analysis using cuffnorm.

Specify a cell array, where each element is a string vector containing file names for a single sample with replicates.

alignmentFiles = {["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam"],...
                  ["Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"]}
isoformNorm = cuffnorm(mergedGTF2, alignmentFiles,...
                      'OutputDirectory', './cuffnormOutput');

Display a table containing the normalized expression levels for each transcript.

readtable(isoformNorm,'FileType','text')

ans =

  2×7 table

      tracking_id          q1_0          q1_2          q1_1          q2_1          q2_0          q2_2   
    ________________    __________    __________    __________    __________    __________    __________

    'TCONS_00000001'    1.0913e+05         78628    1.2132e+05    4.3639e+05    4.2228e+05    4.2814e+05
    'TCONS_00000002'    3.5158e+05    3.7458e+05    3.4238e+05    1.0483e+05    1.1546e+05    1.1105e+05

Column names starting with q have the format: conditionX_N, indicating that the column contains values for replicate N of conditionX.

Input Arguments

collapse all

`alignmentFiles` — Names of SAM or BAM files
string | string vector | character vector | cell array of character vectors

Names of SAM or BAM files, specified as a string, string vector, character vector, or cell array of character vectors. The input files must be sorted by reference position.

Example: 'Myco_1_1.sam'

Data Types: char | string

`cufflinksOptions` — Cufflinks options
`CufflinksOptions` object | character vector | string

Cufflinks options, specified as a CufflinksOptions object, character vector, or string. The character vector or string must be in the cufflinks option syntax (prefixed by one or two dashes) [1].

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: cufflinks(alignmentFile,'TrimCoverageThreshold',5,'FragmentLengthMean',180)

`'EffectiveLengthCorrection'` — Flag to normalize fragment counts
`true` (default) | `false`

Flag to normalize fragment counts to fragments per kilobase per million mapped reads (FPKM), specified as true or false.

Example: 'EffectiveLengthCorrection',false

Data Types: logical

`'ExtraCommand'` — Additional commands
`""` (default) | string | character vector

Additional commands, specified as a string or character vector.

The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB properties.

Example: 'ExtraCommand','--library-type fr-secondstrand'

Data Types: char | string

`'FauxReadTiling'` — Flag to include reference transcripts in assembled output
`true` (default) | `false`

Flag to include reference transcripts in the assembled output as faux-reads during RABT (advanced reference annotation based transcript) assembly, specified as true or false.

Note

The function only performs the RABT assembly if you specify GTFGuide. Otherwise, FauxReadTiling, regardless of being true or false, has no effect on the assembled transcript.

Example: 'FauxReadTiling',false

Data Types: logical

`'FragmentBiasCorrection'` — Name of FASTA file with reference transcripts to detect bias
string | character vector

Name of the FASTA file with reference transcripts to detect bias in fragment counts, specified as a string or character vector. Library preparation can introduce sequence-specific bias into RNA-Seq experiments. Providing reference transcripts improves the accuracy of the transcript abundance estimates.

Example: 'FragmentBiasCorrection','ref.fasta'

Data Types: char | string

`'FragmentLengthMean'` — Expected mean fragment length
`200` (default) | positive integer

Expected mean fragment length, specified as a positive integer. The default value is 200 base pairs. The function can learn the fragment length mean for each SAM file. Using this option is not recommended for paired-end reads.

Example: 'FragmentLengthMean',100

Data Types: double

`'FragmentLengthSD'` — Expected standard deviation for fragment length distribution
`80` (default) | positive scalar

Expected standard deviation for the fragment length distribution, specified as a positive scalar. The default value is 80 base pairs. The function can learn the fragment length standard deviation for each SAM file. Using this option is not recommended for paired-end reads.

Example: 'FragmentLengthSTD',70

Data Types: double

`'GTFGuide'` — Name of GTF file to guide RABT assembly
string | character vector

Name of a GTF file to guide the RABT assembly, specified as a string or character vector.

Example: 'GTFGuide','tr.gtf'

Data Types: char | string

`'IncludeAll'` — Flag to apply all available options
`false` (default) | true

Flag to include all available options with the corresponding default values when converting to the original options syntax, specified as true or false.

The original (native) syntax is prefixed by one or two dashes. By default, the function converts only the specified options. If the value is true, the software converts all available options, with default values for unspecified options, to the original syntax.

Note

If you set IncludeAll to true, the software translates all available properties, with default values for unspecified properties. The only exception is that when the default value of a property is NaN, Inf, [], '', or "", then the software does not translate the corresponding property.

Example: 'IncludeAll',true

Data Types: logical

`'RABTOverhangTolerance'` — Number of base pairs allowed to overlap with transcript intron
`8` (default) | positive integer

Number of base pairs from a read allowed to overlap with a transcript intron when determining if a read is mappable to another transcript during the RABT assembly, specified as a positive integer. The default value is 8.

Note

The function only performs the RABT assembly if you specify GTFGuide. Otherwise, RABTOverhangTolerance has no effect on the assembled transcript.

Example: 'IntronOverhangTolerance',10

Data Types: double

`'JunctionAlpha'` — Alpha value in binomial test to filter false-positive alignments
`0.001` (default) | scalar between `0` and `1`

Alpha value in the binomial test to filter false-positive alignments, specified as a scalar between 0 and 1.

Example: 'JunctionAlpha',0.005

Data Types: double

`'LengthCorrection'` — Flag to correct by transcript length
`true` (default) | `false`

Flag to correct by the transcript length, specified as true or false. Set this value to false only when the fragment count is independent of the feature size, such as for small RNA libraries with no fragmentation and for 3' end sequencing, where all fragments have the same length.

Example: 'LengthCorrection',false

Data Types: logical

`'MaskFile'` — Name of GTF or GFF file containing transcripts to ignore
string | character vector

Name of the GTF or GFF file containing transcripts to ignore during analysis, specified as a string or character vector. Some examples of transcripts to ignore include annotated rRNA transcripts, mitochondrial transcripts, and other abundant transcripts. Ignoring these transcripts improves the robustness of the abundance estimates.

Example: 'MaskFile','excludes.gtf'

Data Types: char | string

`'MaxBundleFrags'` — Maximum number of fragments to include for each locus before skipping
`500000` (default) | positive integer

Maximum number of fragments to include for each locus before skipping new fragments, specified as a positive integer. Skipped fragments are marked with the status HIDATA in the file skipped.gtf.

Example: 'MaxBundleFrags',400000

Data Types: double

`'MaxBundleLength'` — Maximum genomic length in base pairs for bundle
`3500000` (default) | positive integer

Maximum genomic length in base pairs for a bundle, specified as a positive integer.

Example: 'MaxBundleLength',3400000

Data Types: double

`'MaxFragAlignments'` — Maximum number of aligned reads to include for each fragment
`Inf` (default) | positive integer

Maximum number of aligned reads to include for each fragment before skipping new reads, specified as a positive integer. Inf, the default value, sets no limit on the maximum number of aligned reads.

Example: 'MaxFragAlignments',1000

Data Types: double

`'MaxIntronLength'` — Maximum number of bases in intron
`300000` (default) | positive integer

Maximum number of bases in an intron to report, specified as a positive integer. cufflinks also ignores SAM alignments with REF_SKIP CIGAR operations longer than this property value.

Example: 'MaxIntronLength',350000

Data Types: double

`'MaxMLEIterations'` — Maximum number of iterations for maximum likelihood estimation
`5000` (default) | positive integer

Maximum number of iterations for the maximum likelihood estimation of abundances, specified as a positive integer.

Example: 'MaxMLEIterations',4000

Data Types: double

`'MinFragsPerTransfrag'` — Minimum number of aligned RNA-Seq fragments to report
`10` (default) | positive integer

Minimum number of aligned RNA-Seq fragments to report on an assembled transfrag, specified as a positive integer.

Example: 'MinFragsPerTransfrag',15

Data Types: double

`'MinIntronLength'` — Minimum number of base pairs for intron in genome
`50` (default) | positive integer

Minimum number of base pairs for an intron in the genome, specified as a positive integer.

Example: 'MinIntronLength',50

Data Types: double

`'MinIsoformFraction'` — Cuffoff value to report abundance of isoform
`0.1` (default) | scalar between `0` and `1`

Cuffoff value to report the abundance of a particular isoform as a fraction of the most abundant isoform (major isoform), specified as a scalar between 0 and 1. The function filters out transcripts with abundances below the specified value because isoforms expressed at low levels often cannot be assembled reliably. The default value is 0.1, or 10%, of the major isoform of the gene.

Example: 'MinIsoformFraction',0.20

Data Types: double

`'MultiReadCorrection'` — Flag to improve abundance estimation using rescue method
`false` (default) | `true`

Flag to improve abundance estimation for reads mapped to multiple genomic positions using the rescue method, specified as true or false. If the value is false, the function divides multimapped reads uniformly to all mapped positions. If the value is true, the function uses additional information, including gene abundance estimation, inferred fragment length, and fragment bias, to improve transcript abundance estimation.

The rescue method is described in [2].

Example: true

Data Types: logical

`'NormalizeCompatibleHits'` — Flag to use only fragments compatible with reference transcript to calculate FPKM values
`false` (default) | `true`

Flag to use only fragments compatible with a reference transcript to calculate FPKM values, specified as true or false.

Example: 'NormalizeCompatibleHits',false

Data Types: logical

`'NormalizeTotalHits'` — Flag to include all fragments to calculate FPKM values
`false` (default) | `true`

Flag to include all fragments to calculate FPKM values, specified as true or false. If the value is true, the function includes all fragments, including fragments without a compatible reference.

Example: 'NormalizeTotalHits',true

Data Types: logical

`'NumFragAssignmentDraws'` — Number of fragment assignments to perform on each transcript
`50` (default) | positive integer

Number of fragment assignments to perform on each transcript, specified as a positive integer. For each fragment drawn from a transcript, the function performs the specified number of assignments probabilistically to determine the transcript assignment uncertainty and to estimate the variance-covariance matrix for the assigned fragment counts.

Example: 'NumFragAssignmentSamples',40

Data Types: double

`'NumFragDraws'` — Number of draws from negative binomial random number generator
`100` (default) | positive integer

Number of draws from the negative binomial random number generator for each transcript, specified as a positive integer. Each draw is a number of fragments that the function probabilistically assigns to transcripts in the transcriptome to determine the assignment uncertainty and to estimate the variance-covariance matrix for assigned fragment counts.

Example: 'NumFragSamples',90

Data Types: double

`'NumThreads'` — Number of parallel threads to use
`1` (default) | positive integer

Number of parallel threads to use, specified as a positive integer. Threads are run on separate processors or cores. Increasing the number of threads generally improves the runtime significantly, but increases the memory footprint.

Example: 'NumThreads',4

Data Types: double

`'OutputDirectory'` — Directory to store analysis results
current directory (`"./"`) (default) | string | character vector

Directory to store analysis results, specified as a string or character vector.

Example: 'OutputDirectory',"./AnalysisResults/"

Data Types: char | string

`'OverhangTolerance'` — Number of base pairs of overlap with intron
`8` (default) | positive integer

Number of base pairs of overlap with an intron that the function allows when determining if the read is compatible with another transcript, specified as a positive integer.

Example: 'OverhangTolerance',5

Data Types: double

`'RABTOverhangTolerance3'` — Number of base pairs allowed to overhang 3' end of reference transcript
`600` (default) | positive integer

Number of base pairs allowed to overhang the 3' end of each reference transcript during the RABT assembly, specified as a positive integer. The function uses this property when deciding if an assembled transcript is novel or should be merged with the reference.

Note

The function only performs the RABT assembly if you specify GTFGuide. Otherwise, RABTOverhangTolerance3 has no effect on the assembled transcript.

Example: 'OverhangTolerance3',500

Data Types: double

`'OverlapRadius'` — Distance between transfrags
`50` (default) | positive integer

Distance between transfrags, specified as a positive integer. If the distance is below the specified value, the function merges the transfrags. The default value is 50 base pairs.

Example: 'OverlapRadius',40

Data Types: double

`'PreMRNAFraction'` — Threshold to include alignments in intronic intervals
`0.15` (default) | scalar between `0` and `1`

Threshold to include alignments in intronic intervals in the assembly, specified as a scalar between 0 and 1. The function ignores the intronic alignments if the minimum depth of coverage divided by the number of spliced reads is below the specified value. Use this property to filter reads originating from incompletely spliced transcripts.

Example: 'PreMRNAFraction',0.10

Data Types: double

`'ReferenceGTF'` — Name of GTF or GFF file containing reference annotation
string | character vector

Name of a GTF or GFF file containing reference annotation used to estimate isoform expression, specified as a string or character vector. If you provide a ReferenceGTF file, the function does not assemble any novel transcripts and ignores any alignments incompatible with the reference transcripts.

Example: 'ReferenceGTF',"isoest.gtf"

Data Types: char | string

`'Seed'` — Seed for random number generator
`0` (default) | nonnegative integer

Seed for the random number generator, specified as a nonnegative integer. Setting a seed value ensures the reproducibility of the analysis results.

Example: 'Seed',10

Data Types: double

`'SmallAnchorFraction'` — Minimum percentage of alignment on each side of splice junction
`0.09` (default) | scalar between `0` and `1`

Minimum percentage of alignment on each side of a splice junction, specified as a scalar between 0 and 1. The function filters alignments with a percentage smaller than this property value prior to assembly.

Example: 'SmallAnchorFraction',0.1

Data Types: double

`'TranscriptPrefix'` — Prefix for reported transfrags in output GTF file
`"CUFF"` (default) | string | character vector

Prefix for the reported transfrags in the output GTF file, specified as a string or character vector. This option must be a string or character vector with a non-zero length.

Example: 'TranscriptPrefix',"tfrags"

Data Types: char | string

`'TrimCoverageThreshold'` — Minimum average coverage needed for 3' trimming
`10` (default) | positive integer

Minimum average coverage for 3' trimming, specified as a positive integer.

Example: 'TrimCoverageThreshold',8

Data Types: double

`'TrimDropoffFraction'` — Minimum percentage of average coverage
`0.1` (default) | scalar between `0` and `1`

Minimum percentage of average coverage for trimming the 3' end of assembled transcripts, specified as a scalar between 0 and 1.

Example: 'TrimDropoffFraction',0.15

Data Types: double

Output Arguments

collapse all

`transcripts` — Transcript file name
`"./transcripts.gtf"` (default)

Transcript file name, returned as a string. The name of the file is "transcripts.gtf". The file contains the assembled isoforms, along with attributes describing the abundance of reads originating from each transcript.

The output string also includes the directory information defined by OutputDirectory. The default is the current directory. If you set OutputDirectory to "/local/tmp/", the output becomes "/local/tmp/transcripts.gtf".

`isoforms` — Estimated isoform-level expression file name
`"./isoforms.fpkm_tracking"` (default)

Estimated isoform-level expression file name, returned as a string. By default, the name of the file is "isoforms.fpkm_tracking". The file contains estimates for isoform-level expression in cufflinks FPKM tracking format.

`genes` — Estimated gene-level expression file name
`"./genes.fpkm_tracking"` (default)

Estimated gene-level expression file name, returned as a string. By default, the name of the file is "genes.fpkm_tracking". The file contains estimates for gene-level expression in cufflinks FPKM tracking format.

`skippedTranscripts` — Name of file containing skipped transcripts
`"./skipped.gtf"` (default)

Name of the file containing skipped transcripts when processing a locus, returned as a string. By default, the name of the file is "skipped.gtf". The 'MaxBundleFrags' option specifies the maximum number of transcripts (fragments) to include for each locus. After reaching the threshold, the function puts the skipped fragments in this file.

References

[1] Trapnell, Cole, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. “Transcript Assembly and Quantification by RNA-Seq Reveals Unannotated Transcripts and Isoform Switching during Cell Differentiation.” Nature Biotechnology 28, no. 5 (May 2010): 511–15.

[2] Mortazavi, Ali, Brian A Williams, Kenneth McCue, Lorian Schaeffer, and Barbara Wold. “Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq.” Nature Methods 5, no. 7 (July 2008): 621–28. https://doi.org/10.1038/nmeth.1226.

Documentation

cufflinks

Syntax

Description

Examples

Assemble Transcriptome and Perform Differential Expression Testing

Input Arguments

alignmentFiles — Names of SAM or BAM files string | string vector | character vector | cell array of character vectors

cufflinksOptions — Cufflinks options CufflinksOptions object | character vector | string

Name-Value Pair Arguments

'EffectiveLengthCorrection' — Flag to normalize fragment counts true (default) | false

'ExtraCommand' — Additional commands "" (default) | string | character vector

'FauxReadTiling' — Flag to include reference transcripts in assembled output true (default) | false

'FragmentBiasCorrection' — Name of FASTA file with reference transcripts to detect bias string | character vector

'FragmentLengthMean' — Expected mean fragment length 200 (default) | positive integer

'FragmentLengthSD' — Expected standard deviation for fragment length distribution 80 (default) | positive scalar

'GTFGuide' — Name of GTF file to guide RABT assembly string | character vector

'IncludeAll' — Flag to apply all available options false (default) | true

'RABTOverhangTolerance' — Number of base pairs allowed to overlap with transcript intron 8 (default) | positive integer

'JunctionAlpha' — Alpha value in binomial test to filter false-positive alignments 0.001 (default) | scalar between 0 and 1

'LengthCorrection' — Flag to correct by transcript length true (default) | false

'MaskFile' — Name of GTF or GFF file containing transcripts to ignore string | character vector

'MaxBundleFrags' — Maximum number of fragments to include for each locus before skipping 500000 (default) | positive integer

'MaxBundleLength' — Maximum genomic length in base pairs for bundle 3500000 (default) | positive integer

'MaxFragAlignments' — Maximum number of aligned reads to include for each fragment Inf (default) | positive integer

'MaxIntronLength' — Maximum number of bases in intron 300000 (default) | positive integer

'MaxMLEIterations' — Maximum number of iterations for maximum likelihood estimation 5000 (default) | positive integer

'MinFragsPerTransfrag' — Minimum number of aligned RNA-Seq fragments to report 10 (default) | positive integer

'MinIntronLength' — Minimum number of base pairs for intron in genome 50 (default) | positive integer

'MinIsoformFraction' — Cuffoff value to report abundance of isoform 0.1 (default) | scalar between 0 and 1

'MultiReadCorrection' — Flag to improve abundance estimation using rescue method false (default) | true

'NormalizeCompatibleHits' — Flag to use only fragments compatible with reference transcript to calculate FPKM values false (default) | true

'NormalizeTotalHits' — Flag to include all fragments to calculate FPKM values false (default) | true

'NumFragAssignmentDraws' — Number of fragment assignments to perform on each transcript 50 (default) | positive integer

'NumFragDraws' — Number of draws from negative binomial random number generator 100 (default) | positive integer

'NumThreads' — Number of parallel threads to use 1 (default) | positive integer

'OutputDirectory' — Directory to store analysis results current directory ("./") (default) | string | character vector

'OverhangTolerance' — Number of base pairs of overlap with intron 8 (default) | positive integer

'RABTOverhangTolerance3' — Number of base pairs allowed to overhang 3' end of reference transcript 600 (default) | positive integer

'OverlapRadius' — Distance between transfrags 50 (default) | positive integer

'PreMRNAFraction' — Threshold to include alignments in intronic intervals 0.15 (default) | scalar between 0 and 1

'ReferenceGTF' — Name of GTF or GFF file containing reference annotation string | character vector

'Seed' — Seed for random number generator 0 (default) | nonnegative integer

'SmallAnchorFraction' — Minimum percentage of alignment on each side of splice junction 0.09 (default) | scalar between 0 and 1

'TranscriptPrefix' — Prefix for reported transfrags in output GTF file "CUFF" (default) | string | character vector

'TrimCoverageThreshold' — Minimum average coverage needed for 3' trimming 10 (default) | positive integer

'TrimDropoffFraction' — Minimum percentage of average coverage 0.1 (default) | scalar between 0 and 1

Output Arguments

transcripts — Transcript file name "./transcripts.gtf" (default)

isoforms — Estimated isoform-level expression file name "./isoforms.fpkm_tracking" (default)

genes — Estimated gene-level expression file name "./genes.fpkm_tracking" (default)

skippedTranscripts — Name of file containing skipped transcripts "./skipped.gtf" (default)

References

See Also

Topics

External Websites

Bioinformatics Toolbox Documentation

Support

`alignmentFiles` — Names of SAM or BAM files
string | string vector | character vector | cell array of character vectors

`cufflinksOptions` — Cufflinks options
`CufflinksOptions` object | character vector | string

`'EffectiveLengthCorrection'` — Flag to normalize fragment counts
`true` (default) | `false`

`'ExtraCommand'` — Additional commands
`""` (default) | string | character vector

`'FauxReadTiling'` — Flag to include reference transcripts in assembled output
`true` (default) | `false`

`'FragmentBiasCorrection'` — Name of FASTA file with reference transcripts to detect bias
string | character vector

`'FragmentLengthMean'` — Expected mean fragment length
`200` (default) | positive integer

`'FragmentLengthSD'` — Expected standard deviation for fragment length distribution
`80` (default) | positive scalar

`'GTFGuide'` — Name of GTF file to guide RABT assembly
string | character vector

`'IncludeAll'` — Flag to apply all available options
`false` (default) | true

`'RABTOverhangTolerance'` — Number of base pairs allowed to overlap with transcript intron
`8` (default) | positive integer

`'JunctionAlpha'` — Alpha value in binomial test to filter false-positive alignments
`0.001` (default) | scalar between `0` and `1`

`'LengthCorrection'` — Flag to correct by transcript length
`true` (default) | `false`

`'MaskFile'` — Name of GTF or GFF file containing transcripts to ignore
string | character vector

`'MaxBundleFrags'` — Maximum number of fragments to include for each locus before skipping
`500000` (default) | positive integer

`'MaxBundleLength'` — Maximum genomic length in base pairs for bundle
`3500000` (default) | positive integer

`'MaxFragAlignments'` — Maximum number of aligned reads to include for each fragment
`Inf` (default) | positive integer

`'MaxIntronLength'` — Maximum number of bases in intron
`300000` (default) | positive integer

`'MaxMLEIterations'` — Maximum number of iterations for maximum likelihood estimation
`5000` (default) | positive integer

`'MinFragsPerTransfrag'` — Minimum number of aligned RNA-Seq fragments to report
`10` (default) | positive integer

`'MinIntronLength'` — Minimum number of base pairs for intron in genome
`50` (default) | positive integer

`'MinIsoformFraction'` — Cuffoff value to report abundance of isoform
`0.1` (default) | scalar between `0` and `1`

`'MultiReadCorrection'` — Flag to improve abundance estimation using rescue method
`false` (default) | `true`

`'NormalizeCompatibleHits'` — Flag to use only fragments compatible with reference transcript to calculate FPKM values
`false` (default) | `true`

`'NormalizeTotalHits'` — Flag to include all fragments to calculate FPKM values
`false` (default) | `true`

`'NumFragAssignmentDraws'` — Number of fragment assignments to perform on each transcript
`50` (default) | positive integer

`'NumFragDraws'` — Number of draws from negative binomial random number generator
`100` (default) | positive integer

`'NumThreads'` — Number of parallel threads to use
`1` (default) | positive integer

`'OutputDirectory'` — Directory to store analysis results
current directory (`"./"`) (default) | string | character vector

`'OverhangTolerance'` — Number of base pairs of overlap with intron
`8` (default) | positive integer

`'RABTOverhangTolerance3'` — Number of base pairs allowed to overhang 3' end of reference transcript
`600` (default) | positive integer

`'OverlapRadius'` — Distance between transfrags
`50` (default) | positive integer

`'PreMRNAFraction'` — Threshold to include alignments in intronic intervals
`0.15` (default) | scalar between `0` and `1`

`'ReferenceGTF'` — Name of GTF or GFF file containing reference annotation
string | character vector

`'Seed'` — Seed for random number generator
`0` (default) | nonnegative integer

`'SmallAnchorFraction'` — Minimum percentage of alignment on each side of splice junction
`0.09` (default) | scalar between `0` and `1`

`'TranscriptPrefix'` — Prefix for reported transfrags in output GTF file
`"CUFF"` (default) | string | character vector

`'TrimCoverageThreshold'` — Minimum average coverage needed for 3' trimming
`10` (default) | positive integer

`'TrimDropoffFraction'` — Minimum percentage of average coverage
`0.1` (default) | scalar between `0` and `1`

`transcripts` — Transcript file name
`"./transcripts.gtf"` (default)

`isoforms` — Estimated isoform-level expression file name
`"./isoforms.fpkm_tracking"` (default)

`genes` — Estimated gene-level expression file name
`"./genes.fpkm_tracking"` (default)

`skippedTranscripts` — Name of file containing skipped transcripts
`"./skipped.gtf"` (default)