Create remote NCBI BLAST report request ID or link to NCBI BLAST report
blastncbi(
sends a
BLAST request to NCBI against Seq
,Program
)Seq
, a nucleotide or amino acid sequence,
using Program
, a specified BLAST program. Then it returns a link to the
NCBI BLAST report. For help in selecting an appropriate BLAST program, visit https://blast.ncbi.nlm.nih.gov/producttable.shtml.
___ = blastncbi(___,
uses additional options specified by one or more name-value pair arguments, and any of the
arguments in the previous syntaxes.Name,Value
)
Perform a BLAST search on a protein sequence and save the results to an XML file.
Get a sequence from the Protein Data Bank and create a MATLAB structure.
S = getpdb('1CIV');
Use the structure as input for the BLAST search with a significance threshold of 1e-10
. The first output is the request ID, and the second output is the estimated time (in minutes) until the search is completed.
[RID1,ROTE] = blastncbi(S,'blastp','expect',1e-10);
Get the search results from the report. You can save the XML-formatted report to a file for an offline access. Use ROTE as the wait time to retrieve the results.
report1 = getblast(RID1,'WaitTime',ROTE,'ToFile','1CIV_report.xml')
Blast results are not available yet. Please wait ... report1 = struct with fields: RID: 'R49TJMCF014' Algorithm: 'BLASTP 2.6.1+' Database: 'nr' QueryID: 'Query_224139' QueryDefinition: 'unnamed protein product' Hits: [1×100 struct] Parameters: [1×1 struct] Statistics: [1×1 struct]
Use blastread
to read BLAST data from the XML-formatted BLAST report file.
blastdata = blastread('1CIV_report.xml')
blastdata = struct with fields: RID: '' Algorithm: 'BLASTP 2.6.1+' Database: 'nr' QueryID: 'Query_224139' QueryDefinition: 'unnamed protein product' Hits: [1×100 struct] Parameters: [1×1 struct] Statistics: [1×1 struct]
Alternatively, run the BLAST search with an NCBI accession number.
RID2 = blastncbi('AAA59174','blastp','expect',1e-10)
RID2 = 'R49WAPMH014'
Get the search results from the report.
report2 = getblast(RID2)
Blast results are not available yet. Please wait ... report2 = struct with fields: RID: 'R49WAPMH014' Algorithm: 'BLASTP 2.6.1+' Database: 'nr' QueryID: 'AAA59174.1' QueryDefinition: 'insulin receptor precursor [Homo sapiens]' Hits: [1×100 struct] Parameters: [1×1 struct] Statistics: [1×1 struct]
Seq
— Nucleotide or amino acid sequenceNucleotide or amino acid sequence, specified as a character vector, string, or MATLAB structure containing a Sequence
field.
If Seq
is a character vector or string, the available options
are:
GenBank®, GenPept, or RefSeq accession number
Name of a FASTA file
URL pointing to a sequence file
Program
— BLAST programBLAST program, specified as one of the following:
'blastn'
— Search nucleotide
query versus nucleotide database.
'blastp'
— Search protein
query versus protein database.
'blastx'
— Search (translated) nucleotide query versus protein
database.
'megablast'
— Search for highly similar nucleotide sequences.
'tblastn'
— Search protein
query versus translated nucleotide database.
'tblastx'
— Search (translated) nucleotide query versus
(translated) nucleotide database.
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'Matrix','PAM70','Expect',1e-10
uses the PAM70
substitution matrix with the significance threshold for matches set to 1e-10.'Database'
— Database to search'nr'
(default) | character vector | stringDatabase to search, specified as the comma-separated pair consisting of
'Database'
and a character vector or string.
For nucleotide databases, valid choices are:
'nr'
(default)
'refseq_rna'
'refseq_genomic'
'est'
'est_human'
'est_mouse'
'est_others'
'gss'
'htgs'
'pat'
'pdb'
'alu'
'dbsts'
'chromosome'
For protein databases, valid choices are:
'nr'
(default)
'refseq_protein'
'swissprot'
'pat'
'pdb'
'env_nr'
Note
Available databases may change. Check the NCBI website for more information.
For help in selecting an appropriate database, visit
.'MaxNumberSequences'
— Maximum number of hits to returnMaximum number of hits to return, specified as the comma-separated pair consisting of
'MaxNumberSequences'
and a positive integer. The actual search
results may have fewer hits than what you specify, depending on the query, database,
expectation value, and other parameters. The default value is
100
.
'Filter'
— Filter applied to query sequenceFilter applied to the query sequence, specified as the comma-separated pair
consisting of 'Filter'
and one of the following:
'L'
— Mask regions of low compositional
complexity.
'R'
— Mask human repeat elements (valid for
blastn
and megablast
only).
'm'
— Mask the query while producing blast seeds,
but not during extension.
'none'
— No mask is applied.
'l'
— Mask any letter that is lowercase in the
query.
You can specify multiple valid letters in a single character vector or string to
apply multiple filters at once. For example, 'Lm'
applies both the
low compositional complexity filter and the mask.
Choices vary depending on the selected Program
. For more
information, see the table Choices for Optional Properties by BLAST Program.
'Expect'
— Statistical significance threshold for matches10
(default) | positive real numberStatistical significance threshold for matches against database sequences, specified as the
comma-separated pair consisting of 'Expect'
and a positive real
number. The default is 10
.
You can learn more about the statistics of local sequence comparison at https://blast.ncbi.nlm.nih.gov/tutorial/Altschul-1.html#head2.
'Word'
— Word length for query sequenceWord length for the query sequence, specified as the comma-separated pair
consisting of 'Word'
and a positive integer.
Choices for a protein query search are:
2
3
(default)
Choices for a nucleotide query search are:
7
11
(default)
15
Choices when Program
is set to 'megablast'
are:
16
20
24
28
(default)
32
48
64
128
'Matrix'
— Substitution matrix for amino acid sequences'BLOSUM62'
(default) | character vector | stringSubstitution matrix for amino acid sequences, specified as the comma-separated pair consisting
of 'Matrix'
and a character vector or string. The matrix assigns
the score for a possible alignment of any two amino acid residues. Choices are:
'PAM30'
'PAM70'
'BLOSUM45'
'BLOSUM62'
(default)
'BLOSUM80'
'MatchScores'
— Matching and mismatching scores in nucleotide alignmentMatching and mismatching scores in a nucleotide alignment, specified as the
comma-separated pair consisting of 'MatchScores'
and a two-element
numeric vector [R Q]
. The first element R
is the
match score, and the second element Q
is the mismatch score. This
option is for blastn
and megablast
only.
To ensure accurate evaluation of the alignment significance, only a limited set of
combinations are supported. See the table BLAST Optional Properties
for all the supported values. The default value for megablast
is
[1 -2]
, and the default value for blastn
is
[1 -3]
.
'GapCosts'
— Penalties for opening and extending gapPenalties for opening and extending a gap, specified as the comma-separated pair
consisting of 'GapCosts'
and a two-element numeric vector. The
vector contains two integers: the first is the penalty for opening a gap, and the
second is the penalty for extending a gap.
Valid gap costs for blastp
, blastx
,
tblastn
, and tblastx
vary according to the
protein substitution matrix. For details, see GapCosts for blastp, blastx, tblastn, and tblastx.
Valid gap costs for blastn
and megablast
vary according to MatchScores
([R Q]
). For
details, see GapCosts for blastn and megablast.
'CompositionAdjustment'
— Compositional adjustment type to compensate for amino acid compositions'none'
(default) | 'cbs'
| 'ccsm'
| 'ucsm'
Compositional adjustment type to compensate for the amino acid compositions of the
sequences being compared, specified as the comma-separated pair consisting of
'CompositionAdjustment'
and one of the following values:
'none'
— No adjustment is applied (default).
'cbs'
— Composition-based statistics approach is
used for score adjustments.
'ccsm'
— Conditional compositional score matrix is
used for score adjustments.
'ucsm'
— Universal compositional score matrix is
used for score adjustments.
This option is for blastp
, blastx
, and
tblastn
only. The resulting scaled scores yield more accurate
E-values than the standard, unscaled scores. For details, see Compositional adjustments.
'Entrez'
— Entrez query syntax to search a subset of selected databaseEntrez query syntax to search a subset of the selected database, specified as the
comma-separated pair consisting of 'Entrez'
and a character vector
or string. Use this option to limit searches based on molecule types, sequence
lengths, organisms, and so on. For more information on limiting searches, see https://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml#entrez_query.
'Adv'
— Advanced optionsAdvanced options, specified as the comma-separated pair consisting of 'Adv'
and a character vector or string. For instance, to specify the reward and penalty
values for nucleotide matches and mismatches, use '-r 1 -q -3'
. For
more information, see https://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html.
RID
— Request ID for NCBI BLAST reportRequest ID for the NCBI BLAST report, returned as a character vector.
RTOE
— Request time of executionRequest time of execution, returned as an integer. This is an estimated time in minutes until the search is completed.
Tip
If you use the getblast
function to retrieve the
BLAST report, use this time estimate as the 'WaitTime'
option.
Choices for Optional Properties by BLAST Program
When BLAST program is... | Then choices for the following options are... | |||||
---|---|---|---|---|---|---|
Database | Filter | Word | Matrix | MatchScores [R Q] | GapCosts | |
'blastn' | 'nr'
(default)'refseq_rna' 'refseq_genomic' 'est' 'est_human' 'est_mouse' 'est_others' 'gss' 'htgs' 'pat' 'pdb' 'alu' 'dbsts' 'chromosome' | 'Lm'
(default)'R' 'm' 'l' 'none' | 7 11
(default)15 | — | [1 -3] (default)[1 -4] [1 -2] [1 -1] [2
-3] [4 -5] | See GapCosts for blastn and megablast. |
'megablast' | 16 20 24 28
(default)32 48 64 128 | [1 -3]
[1 -4] [1 -2] (default)[1 -1] [2
-3] [4 -5] | ||||
'tblastn' | 'L'
(default)'m' 'l' 'none' | 2 3
(default) | 'PAM30' 'PAM70' 'BLOSUM45' 'BLOSUM62'
(default)'BLOSUM80'
| – | See GapCosts for blastp, blastx, tblastn, and tblastx. | |
'tblastx' | ||||||
'blastp' | 'nr'
(default)'refseq_protein' 'swissprot' 'pat' 'pdb' 'env_nr' | 'L'
'm' 'l' 'none'
(default) | ||||
'blastx' | 'L'
(default)'m' 'l' 'none' |
GapCosts for blastp
, blastx
,
tblastn
, and tblastx
Substitution Matrix | Valid 'GapCosts' Values |
---|---|
'PAM30' | [7 2] [6
2] [5 2] [10
1] [9 1]
(default)[8 1] |
'PAM70' | [8 2] [7
2] [6 2] [11
1] [10 1]
(default)[9 1] |
'BLOSUM80' | |
'BLOSUM45' | [13 3] [12
3] [11 3] [10
3] [15 2]
(default)[14 2] [13
2] [12 2] [19
1] [18 1] [17
1] [16 1] |
'BLOSUM62' | [9 2] [8
2] [7 2] [12
1] [11 1]
(default)[10 1] |
GapCosts for blastn
and megablast
MatchScores [R Q] | Valid 'GapCosts' Values |
---|---|
[1 -4] | [5 2] (default) [1
2] [0 2] [2
1] [1 1] |
[1 -3] | [5 2] (default)[2
2] [1 2] [0
2] [2 1] [1
1] |
[1 -2] | [5 2] (default)[2
2] [1 2] [0
2] [3 1] [2
1] [1 1] |
[1 -1] | [5 2] (default)[3
2] [2 2] [1
2] [0 2] [4
1] [3 1] [2
1] |
[2 -3] | [5 2] (default)[4
4] [2 4] [0
4] [3 3] [6
2] [4 2] [2
2] |
[4 -5] | [5 2] (default)[6
5] [5 5] [4
5] [3 5] |
'psiblast'
BLAST program has been removedErrors starting in R2017b
The BLAST program 'psiblast'
has been removed from one of supported
programs.
'Inclusion'
option has been removedErrors starting in R2017b
The 'Inclusion'
name-value pair has been removed since it only
applies to the psiblast
program which has been also removed.
'Descriptions'
option has been removedErrors starting in R2017b
The 'Descriptions'
name-value pair has been removed. Use
'MaxNumberSequences'
instead to specify the maximum number of hits to
return.
'Alignments'
option has been removedErrors starting in R2017b
The 'Alignments'
name-value pair has been removed. Use
'MaxNumberSequences'
instead to specify the maximum number of hits to
return.
'GapOpen'
option has been removedErrors starting in R2017b
The 'GapOpen'
name-value pair has been removed. Use
'GapCosts'
instead.
'ExtendGap'
option has been removedErrors starting in R2017b
The 'ExtendGap'
name-value pair has been removed. Use
'GapCosts'
instead.
'Pct'
option has been removedErrors starting in R2017b
The 'Pct'
name-value pair has been removed.
[1] Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman (1990). "Basic local alignment search tool." J. Mol. Biol. 215, 403–410.
[2] Altschul, S.F., T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman (1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25, 3389–3402.
You have a modified version of this example. Do you want to open this example with your edits?