Retrieve sequence information from GenBank database
Data
= getgenbank(AccessionNumber
)
getgenbank(AccessionNumber
)
Data
= getgenbank(...,
'PartialSeq', PartialSeqValue
, ...)
Data
= getgenbank(...,
'ToFile', ToFileValue
, ...)
Data
= getgenbank(...,
'FileFormat', FileFormatValue
, ...)
Data
= getgenbank(...,
'SequenceOnly', SequenceOnlyValue
, ...)
AccessionNumber | Character vector or string specifying a unique alphanumeric identifier for a sequence record. |
PartialSeqValue | Two-element array of integers containing the start and end
positions of the subsequence [ that
specifies a subsequence to retrieve. StartBP is
an integer between 1 and EndBP . EndBP is
an integer between StartBP and the length
of the sequence. |
ToFileValue | Character vector or string specifying either a file name or a path and file name for saving the GenBank® data. If you specify only a file name, the file is saved to the MATLAB® Current Folder. |
FileFormatValue | Character vector or string specifying the format for the sequence information. Choices are:
When |
SequenceOnlyValue | Controls the return of only the sequence as a character
array. Choices are |
getgenbank
retrieves nucleotide information
from the GenBank database. This database is maintained by the
National Center for Biotechnology Information (NCBI). For more details
about the GenBank database, see
searches
for the accession number in the GenBank database and returns Data
= getgenbank(AccessionNumber
)Data
,
a MATLAB structure containing information for the sequence.
Tip
If an error occurs while retrieving the GenBank-formatted information, try rerunning the query. Errors can occur due to Internet connectivity issues that are unrelated to the GenBank record.
getgenbank(
displays
information in the MATLAB Command Window without returning data
to a variable. The displayed information is only hyperlinks to the
URLs used to search for and retrieve the data.AccessionNumber
)
getgenbank(..., '
calls PropertyName
', PropertyValue
,
...)getgenbank
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
returns
the specified subsequence in the Data
= getgenbank(...,
'PartialSeq', PartialSeqValue
, ...)Sequence
field
of the MATLAB structure. PartialSeqValue
is
a two-element array of integers containing the start and end positions
of the subsequence [
. StartBP
, EndBP
]StartBP
is
an integer between 1 and EndBP
. EndBP
is
an integer between StartBP
and the length
of the sequence.
saves the data returned from the GenBank database to a file. Data
= getgenbank(...,
'ToFile', ToFileValue
, ...)ToFileValue
is a character vector
or string specifying either a file name or a path and file name for saving the GenBank data. If you specify only a file name, the file is saved to the MATLAB Current Folder. The function does not append data to an existing file. Instead,
it overwrites the contents of the existing file without warning.
Tip
You can read a GenBank-formatted file back into MATLAB using
the genbankread
function.
returns
the sequence in the specified format. Choices are Data
= getgenbank(...,
'FileFormat', FileFormatValue
, ...)'GenBank'
or 'FASTA'
.
When 'FASTA'
, then Data
contains
only two fields, Header
and Sequence
. 'GenBank'
is
the default when SequenceOnlyValue
is false
. 'FASTA'
is
the default when SequenceOnlyValue
is true
.
returns
only the sequence in Data
= getgenbank(...,
'SequenceOnly', SequenceOnlyValue
, ...)Data
, a character
array. Choices are true
or false
(default).
Note
If you use the 'SequenceOnly'
and 'ToFile'
properties
together, the output is always a FASTA-formatted file.
To retrieve the sequence from chromosome 19 that codes for the
human insulin receptor and store it in a structure, S
,
in the MATLAB Command Window, type:
S = getgenbank('M10051') S = LocusName: 'HUMINSR' LocusSequenceLength: '4723' LocusNumberofStrands: '' LocusTopology: 'linear' LocusMoleculeType: 'mRNA' LocusGenBankDivision: 'PRI' LocusModificationDate: '06-JAN-1995' Definition: 'Human insulin receptor mRNA, complete cds.' Accession: 'M10051' Version: 'M10051.1' GI: '186439' Project: [] DBLink: [] Keywords: 'insulin receptor; tyrosine kinase.' Segment: [] Source: 'Homo sapiens (human)' SourceOrganism: [4x65 char] Reference: {[1x1 struct]} Comment: [14x67 char] Features: [51x74 char] CDS: [1x1 struct] Sequence: [1x4723 char] SearchURL: [1x67 char] RetrieveURL: [1x101 char]
By looking at the Features
field of the structure returned, you can
determine that the coding sequence is positions 139 through 4287. To retrieve only the
coding sequence from chromosome 19 that codes for the human insulin receptor and store it in
a structure, CDS
, in the MATLAB Command Window, type:
CDS = getgenbank('M10051','PARTIALSEQ',[139,4287]);
genbankread
| getembl
| getgenpept
| getpdb
| seqviewer