blastncbi

Create remote NCBI BLAST report request ID or link to NCBI BLAST report

Syntax

blastncbi(Seq,Program)

RID = blastncbi(Seq,Program)

[RID,RTOE]
 = blastncbi(Seq,Program)

___ = blastncbi(___,Name,Value)

Description

example

blastncbi(Seq,Program) sends a BLAST request to NCBI against Seq, a nucleotide or amino acid sequence, using Program, a specified BLAST program. Then it returns a link to the NCBI BLAST report. For help in selecting an appropriate BLAST program, visit https://blast.ncbi.nlm.nih.gov/producttable.shtml.

example

RID = blastncbi(Seq,Program) returns RID, the Request ID for the report.

example

[RID,RTOE] = blastncbi(Seq,Program) returns both RID, the Request ID for the NCBI BLAST report, and RTOE, the Request Time Of Execution, which is an estimated time needed for the search to finish.

example

___ = blastncbi(___,Name,Value) uses additional options specified by one or more name-value pair arguments, and any of the arguments in the previous syntaxes.

Examples

collapse all

Perform BLAST search

Open Script

Perform a BLAST search on a protein sequence and save the results to an XML file.

Get a sequence from the Protein Data Bank and create a MATLAB structure.

S = getpdb('1CIV');

Use the structure as input for the BLAST search with a significance threshold of 1e-10. The first output is the request ID, and the second output is the estimated time (in minutes) until the search is completed.

[RID1,ROTE] = blastncbi(S,'blastp','expect',1e-10);

Get the search results from the report. You can save the XML-formatted report to a file for an offline access. Use ROTE as the wait time to retrieve the results.

report1 = getblast(RID1,'WaitTime',ROTE,'ToFile','1CIV_report.xml')

Blast results are not available yet. Please wait ...

report1 = 

  struct with fields:

                RID: 'R49TJMCF014'
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'Query_224139'
    QueryDefinition: 'unnamed protein product'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Use blastread to read BLAST data from the XML-formatted BLAST report file.

blastdata = blastread('1CIV_report.xml')

blastdata = 

  struct with fields:

                RID: ''
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'Query_224139'
    QueryDefinition: 'unnamed protein product'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Alternatively, run the BLAST search with an NCBI accession number.

RID2 = blastncbi('AAA59174','blastp','expect',1e-10)

RID2 =

    'R49WAPMH014'

Get the search results from the report.

report2 = getblast(RID2)

Blast results are not available yet. Please wait ...

report2 = 

  struct with fields:

                RID: 'R49WAPMH014'
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'AAA59174.1'
    QueryDefinition: 'insulin receptor precursor [Homo sapiens]'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Input Arguments

collapse all

`Seq` — Nucleotide or amino acid sequence
character vector | string | MATLAB^® structure

Nucleotide or amino acid sequence, specified as a character vector, string, or MATLAB structure containing a Sequence field.

If Seq is a character vector or string, the available options are:

GenBank^®, GenPept, or RefSeq accession number
Name of a FASTA file
URL pointing to a sequence file

`Program` — BLAST program
character vector | string

BLAST program, specified as one of the following:

'blastn' — Search nucleotide query versus nucleotide database.
'blastp' — Search protein query versus protein database.
'blastx' — Search (translated) nucleotide query versus protein database.
'megablast' — Search for highly similar nucleotide sequences.
'tblastn' — Search protein query versus translated nucleotide database.
'tblastx' — Search (translated) nucleotide query versus (translated) nucleotide database.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Matrix','PAM70','Expect',1e-10 uses the PAM70 substitution matrix with the significance threshold for matches set to 1e-10.

`'Database'` — Database to search
`'nr'` (default) | character vector | string

Database to search, specified as the comma-separated pair consisting of 'Database' and a character vector or string.

For nucleotide databases, valid choices are:

'nr' (default)
'refseq_rna'
'refseq_genomic'
'est'
'est_human'
'est_mouse'
'est_others'
'gss'
'htgs'
'pat'
'pdb'
'alu'
'dbsts'
'chromosome'

For protein databases, valid choices are:

'nr' (default)
'refseq_protein'
'swissprot'
'pat'
'pdb'
'env_nr'

Note

Available databases may change. Check the NCBI website for more information.

For help in selecting an appropriate database, visit

https://blast.ncbi.nlm.nih.gov/producttable.shtml

`'MaxNumberSequences'` — Maximum number of hits to return
100 (default) | positive integer

Maximum number of hits to return, specified as the comma-separated pair consisting of 'MaxNumberSequences' and a positive integer. The actual search results may have fewer hits than what you specify, depending on the query, database, expectation value, and other parameters. The default value is 100.

`'Filter'` — Filter applied to query sequence
character vector | string

Filter applied to the query sequence, specified as the comma-separated pair consisting of 'Filter' and one of the following:

'L' — Mask regions of low compositional complexity.
'R' — Mask human repeat elements (valid for blastn and megablast only).
'm' — Mask the query while producing blast seeds, but not during extension.
'none' — No mask is applied.
'l' — Mask any letter that is lowercase in the query.

You can specify multiple valid letters in a single character vector or string to apply multiple filters at once. For example, 'Lm' applies both the low compositional complexity filter and the mask.

Choices vary depending on the selected Program. For more information, see the table Choices for Optional Properties by BLAST Program.

`'Expect'` — Statistical significance threshold for matches
`10` (default) | positive real number

Statistical significance threshold for matches against database sequences, specified as the comma-separated pair consisting of 'Expect' and a positive real number. The default is 10.

You can learn more about the statistics of local sequence comparison at https://blast.ncbi.nlm.nih.gov/tutorial/Altschul-1.html#head2.

`'Word'` — Word length for query sequence
positive integer

Word length for the query sequence, specified as the comma-separated pair consisting of 'Word' and a positive integer.

Choices for a protein query search are:

2
3 (default)

Choices for a nucleotide query search are:

7
11 (default)
15

Choices when Program is set to 'megablast' are:

16
20
24
28 (default)
32
48
64
128

`'Matrix'` — Substitution matrix for amino acid sequences
`'BLOSUM62'` (default) | character vector | string

Substitution matrix for amino acid sequences, specified as the comma-separated pair consisting of 'Matrix' and a character vector or string. The matrix assigns the score for a possible alignment of any two amino acid residues. Choices are:

'PAM30'
'PAM70'
'BLOSUM45'
'BLOSUM62' (default)
'BLOSUM80'

`'MatchScores'` — Matching and mismatching scores in nucleotide alignment
two-element numeric vector

Matching and mismatching scores in a nucleotide alignment, specified as the comma-separated pair consisting of 'MatchScores' and a two-element numeric vector [R Q]. The first element R is the match score, and the second element Q is the mismatch score. This option is for blastn and megablast only.

To ensure accurate evaluation of the alignment significance, only a limited set of combinations are supported. See the table BLAST Optional Properties for all the supported values. The default value for megablast is [1 -2], and the default value for blastn is [1 -3].

`'GapCosts'` — Penalties for opening and extending gap
two-element numeric vector

Penalties for opening and extending a gap, specified as the comma-separated pair consisting of 'GapCosts' and a two-element numeric vector. The vector contains two integers: the first is the penalty for opening a gap, and the second is the penalty for extending a gap.

Valid gap costs for blastp, blastx, tblastn, and tblastx vary according to the protein substitution matrix. For details, see GapCosts for blastp, blastx, tblastn, and tblastx.

Valid gap costs for blastn and megablast vary according to MatchScores ([R Q]). For details, see GapCosts for blastn and megablast.

`'CompositionAdjustment'` — Compositional adjustment type to compensate for amino acid compositions
`'none'` (default) | `'cbs'` | `'ccsm'` | `'ucsm'`

Compositional adjustment type to compensate for the amino acid compositions of the sequences being compared, specified as the comma-separated pair consisting of 'CompositionAdjustment' and one of the following values:

'none'— No adjustment is applied (default).
'cbs'— Composition-based statistics approach is used for score adjustments.
'ccsm'— Conditional compositional score matrix is used for score adjustments.
'ucsm'— Universal compositional score matrix is used for score adjustments.

This option is for blastp, blastx, and tblastn only. The resulting scaled scores yield more accurate E-values than the standard, unscaled scores. For details, see Compositional adjustments.

`'Entrez'` — Entrez query syntax to search a subset of selected database
character vector | string

Entrez query syntax to search a subset of the selected database, specified as the comma-separated pair consisting of 'Entrez' and a character vector or string. Use this option to limit searches based on molecule types, sequence lengths, organisms, and so on. For more information on limiting searches, see https://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml#entrez_query.

`'Adv'` — Advanced options
character vector | string

Advanced options, specified as the comma-separated pair consisting of 'Adv' and a character vector or string. For instance, to specify the reward and penalty values for nucleotide matches and mismatches, use '-r 1 -q -3'. For more information, see https://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html.

Output Arguments

collapse all

`RID` — Request ID for NCBI BLAST report
character vector

Request ID for the NCBI BLAST report, returned as a character vector.

`RTOE` — Request time of execution
integer

Request time of execution, returned as an integer. This is an estimated time in minutes until the search is completed.

Tip

If you use the getblast function to retrieve the BLAST report, use this time estimate as the 'WaitTime' option.

More About

collapse all

BLAST Optional Properties

Choices for Optional Properties by BLAST Program

When BLAST program is...	Then choices for the following options are...
When BLAST program is...	Database	Filter	Word	Matrix	MatchScores `[R Q]`	GapCosts
`'blastn'`	`'nr'` (default) `'refseq_rna'` `'refseq_genomic''est'` `'est_human'` `'est_mouse'` `'est_others'` `'gss'` `'htgs'` `'pat'` `'pdb'` `'alu'` `'dbsts'` `'chromosome'`	`'Lm'` (default) `'R'` `'m'` `'l'` `'none'`	`7` `11` (default) `15`	—	`[1 -3]` (default) `[1 -4]` `[1 -2]` `[1 -1]` `[2 -3]` `[4 -5]`	See GapCosts for blastn and megablast.
`'megablast'`		`'Lm'` (default) `'R'` `'m'` `'l'` `'none'`	`16` `20` `24` `28` (default) `32` `48` `64` `128`	—	`[1 -3]` `[1 -4]` `[1 -2]` (default) `[1 -1]` `[2 -3]` `[4 -5]`	See GapCosts for blastn and megablast.
`'tblastn'`		`'L'` (default) `'m'` `'l'` `'none'`	`2` `3` (default)	`'PAM30'` `'PAM70'` `'BLOSUM45'` `'BLOSUM62'` (default) `'BLOSUM80'`	–	See GapCosts for blastp, blastx, tblastn, and tblastx.
`'tblastx'`		`'L'` (default) `'m'` `'l'` `'none'`
`'blastp'`	`'nr'` (default) `'refseq_protein'` `'swissprot'` `'pat'` `'pdb'` `'env_nr'`	`'L'` `'m'` `'l'` `'none'` (default)
`'blastx'`		`'L'` (default) `'m'` `'l'` `'none'`

GapCosts for blastp, blastx, tblastn, and tblastx

Substitution Matrix	Valid `'GapCosts'` Values
`'PAM30'`	`[7 2]` `[6 2]` `[5 2]` `[10 1]` `[9 1]` (default) `[8 1]`
`'PAM70'`	`[8 2]` `[7 2]` `[6 2]` `[11 1]` `[10 1]` (default) `[9 1]`
`'BLOSUM80'`	`[8 2]` `[7 2]` `[6 2]` `[11 1]` `[10 1]` (default) `[9 1]`
`'BLOSUM45'`	`[13 3]` `[12 3]` `[11 3]` `[10 3]` `[15 2]` (default) `[14 2]` `[13 2]` `[12 2]` `[19 1]` `[18 1]` `[17 1]` `[16 1]`
`'BLOSUM62'`	`[9 2]` `[8 2]` `[7 2]` `[12 1]` `[11 1]` (default) `[10 1]`

GapCosts for blastn and megablast

MatchScores [R Q]	Valid `'GapCosts'` Values
`[1 -4]`	`[5 2]` (default) `[1 2]` `[0 2]` `[2 1]` `[1 1]`
`[1 -3]`	`[5 2]`(default) `[2 2]` `[1 2]` `[0 2]` `[2 1]` `[1 1]`
`[1 -2]`	`[5 2]`(default) `[2 2]` `[1 2]` `[0 2]` `[3 1]` `[2 1]` `[1 1]`
`[1 -1]`	`[5 2]`(default) `[3 2]` `[2 2]` `[1 2]` `[0 2]` `[4 1]` `[3 1]` `[2 1]`
`[2 -3]`	`[5 2]`(default) `[4 4]` `[2 4]` `[0 4]` `[3 3]` `[6 2]` `[4 2]` `[2 2]`
`[4 -5]`	`[5 2]`(default) `[6 5]` `[5 5]` `[4 5]` `[3 5]`

Compatibility Considerations

expand all

`'psiblast'` BLAST program has been removed

Errors starting in R2017b

The BLAST program 'psiblast' has been removed from one of supported programs.

`'Inclusion'` option has been removed

Errors starting in R2017b

The 'Inclusion' name-value pair has been removed since it only applies to the psiblast program which has been also removed.

`'Descriptions'` option has been removed

Errors starting in R2017b

The 'Descriptions' name-value pair has been removed. Use 'MaxNumberSequences' instead to specify the maximum number of hits to return.

`'Alignments'` option has been removed

Errors starting in R2017b

The 'Alignments' name-value pair has been removed. Use 'MaxNumberSequences' instead to specify the maximum number of hits to return.

`'GapOpen'` option has been removed

Errors starting in R2017b

The 'GapOpen' name-value pair has been removed. Use 'GapCosts' instead.

`'ExtendGap'` option has been removed

Errors starting in R2017b

The 'ExtendGap' name-value pair has been removed. Use 'GapCosts' instead.

`'Pct'` option has been removed

Errors starting in R2017b

The 'Pct' name-value pair has been removed.

References

[1] Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman (1990). "Basic local alignment search tool." J. Mol. Biol. 215, 403–410.

[2] Altschul, S.F., T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman (1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25, 3389–3402.

Documentation

blastncbi

Syntax

Description

Examples

Perform BLAST search

Input Arguments

`Seq` — Nucleotide or amino acid sequence
character vector | string | MATLAB^® structure

`Program` — BLAST program
character vector | string

Name-Value Pair Arguments

`'Database'` — Database to search
`'nr'` (default) | character vector | string

`'MaxNumberSequences'` — Maximum number of hits to return
100 (default) | positive integer

`'Filter'` — Filter applied to query sequence
character vector | string

`'Expect'` — Statistical significance threshold for matches
`10` (default) | positive real number

`'Word'` — Word length for query sequence
positive integer

`'Matrix'` — Substitution matrix for amino acid sequences
`'BLOSUM62'` (default) | character vector | string

`'MatchScores'` — Matching and mismatching scores in nucleotide alignment
two-element numeric vector

`'GapCosts'` — Penalties for opening and extending gap
two-element numeric vector

`'CompositionAdjustment'` — Compositional adjustment type to compensate for amino acid compositions
`'none'` (default) | `'cbs'` | `'ccsm'` | `'ucsm'`

`'Entrez'` — Entrez query syntax to search a subset of selected database
character vector | string

`'Adv'` — Advanced options
character vector | string

Output Arguments

`RID` — Request ID for NCBI BLAST report
character vector

`RTOE` — Request time of execution
integer

More About

BLAST Optional Properties

Compatibility Considerations

`'psiblast'` BLAST program has been removed

`'Inclusion'` option has been removed

`'Descriptions'` option has been removed

`'Alignments'` option has been removed

`'GapOpen'` option has been removed

`'ExtendGap'` option has been removed

`'Pct'` option has been removed

References

See Also

External Websites

Bioinformatics Toolbox Documentation

Support

Documentation

blastncbi

Syntax

Description

Examples

Perform BLAST search

Input Arguments

Seq — Nucleotide or amino acid sequence character vector | string | MATLAB® structure

Program — BLAST program character vector | string

Name-Value Pair Arguments

'Database' — Database to search 'nr' (default) | character vector | string

'MaxNumberSequences' — Maximum number of hits to return 100 (default) | positive integer

'Filter' — Filter applied to query sequence character vector | string

'Expect' — Statistical significance threshold for matches 10 (default) | positive real number

'Word' — Word length for query sequence positive integer

'Matrix' — Substitution matrix for amino acid sequences 'BLOSUM62' (default) | character vector | string

'MatchScores' — Matching and mismatching scores in nucleotide alignment two-element numeric vector

'GapCosts' — Penalties for opening and extending gap two-element numeric vector

'CompositionAdjustment' — Compositional adjustment type to compensate for amino acid compositions 'none' (default) | 'cbs' | 'ccsm' | 'ucsm'

'Entrez' — Entrez query syntax to search a subset of selected database character vector | string

'Adv' — Advanced options character vector | string

Output Arguments

RID — Request ID for NCBI BLAST report character vector

RTOE — Request time of execution integer

More About

BLAST Optional Properties

Compatibility Considerations

'psiblast' BLAST program has been removed

'Inclusion' option has been removed

'Descriptions' option has been removed

'Alignments' option has been removed

'GapOpen' option has been removed

'ExtendGap' option has been removed

'Pct' option has been removed

References

See Also

External Websites

Bioinformatics Toolbox Documentation

Support

`Seq` — Nucleotide or amino acid sequence
character vector | string | MATLAB^® structure

`Program` — BLAST program
character vector | string

`'Database'` — Database to search
`'nr'` (default) | character vector | string

`'MaxNumberSequences'` — Maximum number of hits to return
100 (default) | positive integer

`'Filter'` — Filter applied to query sequence
character vector | string

`'Expect'` — Statistical significance threshold for matches
`10` (default) | positive real number

`'Word'` — Word length for query sequence
positive integer

`'Matrix'` — Substitution matrix for amino acid sequences
`'BLOSUM62'` (default) | character vector | string

`'MatchScores'` — Matching and mismatching scores in nucleotide alignment
two-element numeric vector

`'GapCosts'` — Penalties for opening and extending gap
two-element numeric vector

`'CompositionAdjustment'` — Compositional adjustment type to compensate for amino acid compositions
`'none'` (default) | `'cbs'` | `'ccsm'` | `'ucsm'`

`'Entrez'` — Entrez query syntax to search a subset of selected database
character vector | string

`'Adv'` — Advanced options
character vector | string

`RID` — Request ID for NCBI BLAST report
character vector

`RTOE` — Request time of execution
integer

`'psiblast'` BLAST program has been removed

`'Inclusion'` option has been removed

`'Descriptions'` option has been removed

`'Alignments'` option has been removed

`'GapOpen'` option has been removed

`'ExtendGap'` option has been removed

`'Pct'` option has been removed