Datastore for data in database
MATLAB® has various datastores that let you import large data sets into
MATLAB for analysis. A DatabaseDatastore
object is a type of
datastore that contains data from a database table or the results from executing an SQL
query in a relational database. For details about other datastores, see Getting Started with Datastore (MATLAB).
With a DatabaseDatastore
object, you can preview and read records or
chunks in a data set and reset the DatabaseDatastore
to its initial
state. Also, you can analyze a large data set in a database using tall arrays or
MapReduce.
Reading data from DatabaseDatastore
objects is the same as executing
the fetch
function on the data set. Using
DatabaseDatastore
objects provides advantages that enable you to:
Work with databases containing large amounts of data.
Analyze large amounts of data using tall arrays with common MATLAB functions, such as mean
and
histogram
. Create a tall array using the tall
function. For details,
see Tall Arrays for Out-of-Memory Data (MATLAB).
Write MapReduce algorithms that define the chunking and reduction of large
amounts of data by using the mapreduce
function. For
details, see Getting Started with MapReduce (MATLAB). For an example, see Analyze Large Data in Database Using MapReduce. For more
MapReduce examples, see Building Effective
Algorithms with MapReduce (MATLAB).
specifies additional options using one or more name-value pair arguments.
For example, dbds
= databaseDatastore(conn
,source
,Name,Value
)'ReadSize',100
retrieves 100 rows of data
from the DatabaseDatastore
object.
customizes the options for importing a large data set from a database using
the dbds
= databaseDatastore(conn
,source
,opts
)SQLImportOptions
object.
specifies additional options using one or more name-value pair arguments.
For example, dbds
= databaseDatastore(conn
,source
,opts
,Name,Value
)'Catalog','toy_store'
retrieves data from
the toy_store
database catalog.
conn
— Database connectionconnection
objectDatabase connection, specified as a connection
object created with the
database
function.
source
— SourceSource, specified as a character vector or string scalar. The source
indicates whether the DatabaseDatastore
object stores
data from a database table or the results from an executed SQL
query.
Example: 'inventorytable'
Example: "SELECT productnumber,productname FROM
producttable"
Data Types: char
| string
opts
— Database import optionsSQLImportOptions
objectDatabase import options, specified as an SQLImportOptions
object.
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
databaseDatastore(conn,source,'ReadSize',100,'Catalog','toy_store')
creates a DatabaseDatastore
object and stores 100 rows of
data from a table or SQL query using the toy_store
database
catalog.'ReadSize'
— Number of rows to returnNumber of rows to return, specified as the comma-separated pair
consisting of 'ReadSize'
and a positive numeric
scalar. Use this name-value pair argument to limit the number of
rows for retrieval from the DatabaseDatastore
object.
Example: 1000
Data Types: double
'Catalog'
— Database catalog nameDatabase catalog name, specified as the comma-separated pair consisting of 'Catalog'
and a character vector or string scalar. A catalog serves as the container for the schemas in a database and contains related metadata information. A database can have numerous catalogs.
Use the 'Catalog'
name-value pair argument only when source
is a database table.
Example: 'Catalog','toy_store'
Data Types: char
| string
'Schema'
— Database schema nameDatabase schema name, specified as the comma-separated pair consisting of 'Schema'
and a character vector or string scalar. A schema defines the database tables, views, relationships among tables, and other elements. A database catalog can have numerous schemas.
Use the 'Schema'
name-value pair argument only when source
is a database table.
Example: 'Schema','dbo'
Data Types: char
| string
The DatabaseDatastore
object supports only Microsoft®
SQL Server® 2012 and later versions.
The DatabaseDatastore
object does not support using a
parallel pool with Parallel
Computing Toolbox™ installed. To analyze data using tall arrays or to run
MapReduce algorithms, set the global execution environment to be the local
MATLAB session by using mapreducer
. Enter this
code:
mapreducer(0)
Connection
— Database connectionconnection
objectThis property is read-only.
Database connection, specified as a connection
object
created using database
.
Query
— SQL queryThis property is read-only.
SQL query, specified as a character vector that specifies the SQL query to execute in the database.
Data Types: char
VariableNames
— Column names of retrieved data tableThis property is read-only.
Column names of the retrieved data table, specified as a cell array of one or more character vectors.
Data Types: char
ReadSize
— Number of rows to readNumber of rows to read from the retrieved data table, specified as a
nonnegative numeric scalar. To specify the number of rows to read, set the
ReadSize
property.
Example: dbds.ReadSize = 5000;
Data Types: double
hasdata | Determine if data in DatabaseDatastore is
available to read |
preview | Return subset of data from DatabaseDatastore |
read | Read data in DatabaseDatastore |
readall | Read all data in DatabaseDatastore |
reset | Reset DatabaseDatastore to initial
state |
close | Close and invalidate database and driver resource utilizer |
isPartitionable | Determine whether datastore is partitionable |
isShuffleable | Determine whether datastore is shuffleable |
DatabaseDatastore
Object Using SQL Query ResultsCreate a database connection using a JDBC driver. To create this connection, you must configure a JDBC data source. For more information, see the
function. Then, create a configureJDBCDataSource
DatabaseDatastore
object using the results from an SQL query and preview a large data set.
Create a database connection to the JDBC data source MSSQLServerJDBCAuth
. This data source configures a JDBC driver to a Microsoft® SQL Server® database with Windows® authentication. Specify a blank user name and password.
datasource = "MSSQLServerJDBCAuth"; username = ""; password = ""; conn = database(datasource,username,password);
Create a DatabaseDatastore
object using a database connection and an SQL query. This SQL query retrieves all flight data from the airlinesmall
table. databaseDatastore
executes the SQL query.
sqlquery = 'select * from airlinesmall';
dbds = databaseDatastore(conn,sqlquery)
dbds = DatabaseDatastore with properties: Connection: [1×1 database.jdbc.connection] Query: 'select * from airlinesmall' VariableNames: {1×29 cell} ReadSize: 10000
dbds
is a DatabaseDatastore
object with these properties:
Connection
-- Database connection object
Query
-- Executed SQL query
VariableNames
-- List of column names from the executed SQL query
ReadSize
-- Maximum number of records to read from the executed SQL query
Display the database connection property.
dbds.Connection
ans = connection with properties: DataSource: 'MSSQLServerJDBCAuth' UserName: '' Driver: 'com.microsoft.sqlserver.j ...' URL: 'jdbc:sqlserver://dbtb04:5 ...' Message: '' Type: 'JDBC Connection Object' Database Properties: AutoCommit: 'on' ReadOnly: 'off' LoginTimeout: 0 MaxDatabaseConnections: 0 Catalog and Schema Information: DefaultCatalog: 'toy_store' Catalogs: {'master', 'model', 'msdb' ... and 2 more} Schemas: {'db_accessadmin', 'db_backupoperator', 'db_datareader' ... and 15 more} Database and Driver Information: DatabaseProductName: 'Microsoft SQL Server' DatabaseProductVersion: '11.00.2100' DriverName: 'Microsoft JDBC Driver 4.0 ...' DriverVersion: '4.0.2206.100'
The Message
property is blank when the database connection is successful.
Preview the first eight records in the large data set returned by executing the SQL query in the DatabaseDatastore
object.
preview(dbds)
ans = 8×29 table Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay ____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ ________ _________________ ______________ _______ ________ ________ ______ _____ ________ ______ _______ _________ ________________ ________ ____________ ____________ ________ _____________ _________________ 1990 9 11 2 1810 1812 1939 1930 'AA' 1426 'NA' 89 78 'NA' 9 -2 'RST' 'ORD' 268 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 27 6 1353 1355 1634 1640 'US' 112 'NA' 161 165 'NA' -6 -2 'TPA' 'SYR' 1104 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 23 2 1057 1055 1205 1155 'US' 1621 'NA' 68 60 'NA' 10 2 'ROC' 'EWR' 246 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 8 1 1515 1440 1609 1535 'NW' 749 'NA' 54 55 'NA' 34 35 'MSP' 'FSD' 197 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 19 5 1130 1120 1203 1154 'UA' 369 'NA' 93 94 'NA' 9 10 'BUF' 'ORD' 473 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 12 5 1755 1733 1858 1820 'DL' 590 'NA' 63 47 'NA' 38 22 'BOS' 'BGR' 201 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 2001 11 22 4 1345 1355 1530 1549 'MQ' 4982 '#NAME?' 105 114 '90' -19 -10 'JAX' 'MIA' 334 '8' '7' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 2001 11 26 1 2105 2110 2209 2237 'AA' 1947 'N3BäA1' 64 87 '47' -28 -5 'SFO' 'LAX' 337 '6' '11' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA'
Close the DatabaseDatastore
object and the database connection.
close(dbds)
DatabaseDatastore
Object Using Database TableRetrieve a large data set from a database table by creating a DatabaseDatastore
object.
This example uses a Microsoft® SQL Server® Version 11.00.2100 database and the Microsoft SQL Server Driver 11.00.5058.
Create a database connection to a Microsoft SQL Server database with Windows® authentication. Specify a blank user name and password.
datasource = 'MS SQL Server Auth'; conn = database(datasource,'','');
Load flight information in the MATLAB® workspace.
flights = readtable('airlinesmall_subset.xlsx');
Create the flights
database table using the flight information.
tablename = 'flights';
sqlwrite(conn,tablename,flights)
Create a DatabaseDatastore
object using a database connection and the flights
database table.
dbds = databaseDatastore(conn,tablename)
dbds = DatabaseDatastore with properties: Connection: [1×1 database.odbc.connection] Query: 'SELECT * from flights' VariableNames: {1×29 cell} ReadSize: 10000
dbds
is a DatabaseDatastore
object with these properties:
Connection
— Database connection object
Query
— Executed SQL query
VariableNames
— List of column names from the executed SQL query
ReadSize
— Maximum number of records to read from the executed SQL query
Display the database connection property.
dbds.Connection
ans = connection with properties: DataSource: 'MS SQL Server Auth' UserName: '' Message: '' Type: 'ODBC Connection Object' Database Properties: AutoCommit: 'on' ReadOnly: 'off' LoginTimeout: 15 MaxDatabaseConnections: 0 Catalog and Schema Information: DefaultCatalog: 'toy_store' Catalogs: {'master', 'msdb', 'tempdb' ... and 1 more} Schemas: {'dbo', 'guest', 'INFORMATION_SCHEMA' ... and 2 more} Database and Driver Information: DatabaseProductName: 'Microsoft SQL Server' DatabaseProductVersion: '11.00.2100' DriverName: 'sqlncli11.dll' DriverVersion: '11.00.6518'
The Message
property is blank when the database connection is successful.
Preview the first eight records in the data set returned by executing the SQL query in the DatabaseDatastore
object.
preview(dbds)
ans=8×29 table
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay SDelay SecurityDelay LateAircraftDelay
____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ ________ _________________ ______________ _______ ________ ________ ______ _____ ________ ______ _______ _________ ________________ ________ ____________ ____________ ______ _____________ _________________
1996 1 18 4 2117 2120 2305 2259 'HP' 415 'N637AW' 108 99 85 6 -3 'COS' 'PHX' 551 5 18 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 12 5 1252 1245 1511 1500 'HP' 610 'N905AW' 79 75 58 11 7 'LAX' 'PHX' 370 3 18 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 16 2 1441 1445 1708 1721 'HP' 211 'N165AW' 87 96 74 -13 -4 'RNO' 'PHX' 601 4 9 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 1 1 2258 2300 2336 2335 'HP' 1245 'N183AW' 38 35 20 1 -2 'TUS' 'PHX' 110 6 12 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 4 4 1814 1814 1901 1910 'US' 683 'N963VJ' 47 56 34 -9 0 'DTW' 'PIT' 201 6 7 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 31 3 1822 1820 1934 1925 'US' 757 'N912VJ' 72 65 52 9 2 'PHL' 'PIT' 267 6 14 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 18 4 729 730 841 843 'US' 1564 'N941VJ' 72 73 58 -2 -1 'DCA' 'PVD' 357 3 11 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 26 5 1704 1705 1829 1839 'NW' 1538 'N960N' 85 94 69 -10 -1 'DTW' 'RIC' 456 3 13 0 NaN 0 NaN NaN NaN NaN NaN
Close the DatabaseDatastore
object and the database connection.
close(dbds)
DatabaseDatastore
Object with Specific Record CountCreate a database connection using a JDBC driver. To create this connection, you must configure a JDBC data source. For more information, see the
function. Then, create a configureJDBCDataSource
DatabaseDatastore
object by setting the ReadSize
property, and preview a large data set.
Create a database connection to the JDBC data source MSSQLServerJDBCAuth
. This data source configures a JDBC driver to a Microsoft® SQL Server® database with Windows® authentication. Specify a blank user name and password.
datasource = "MSSQLServerJDBCAuth"; username = ""; password = ""; conn = database(datasource,username,password);
Create a DatabaseDatastore
object using a database connection and an SQL query. This SQL query retrieves all flight data from the airlinesmall
table. Specify reading a maximum of 1000 records from the executed SQL query. databaseDatastore
executes the SQL query.
sqlquery = 'select * from airlinesmall'; dbds = databaseDatastore(conn,sqlquery,'ReadSize',1000)
dbds = DatabaseDatastore with properties: Connection: [1×1 database.jdbc.connection] Query: 'select * from airlinesmall' VariableNames: {1×29 cell} ReadSize: 1000
dbds
is a DatabaseDatastore
object with these properties:
Connection
-- Database connection object
Query
-- Executed SQL query
VariableNames
-- List of column names from the executed SQL query
ReadSize
-- Maximum number of records to read from the executed SQL query
Display the database connection property.
dbds.Connection
ans = connection with properties: DataSource: 'MSSQLServerJDBCAuth' UserName: '' Driver: 'com.microsoft.sqlserver.j ...' URL: 'jdbc:sqlserver://dbtb04:5 ...' Message: '' Type: 'JDBC Connection Object' Database Properties: AutoCommit: 'on' ReadOnly: 'off' LoginTimeout: 0 MaxDatabaseConnections: 0 Catalog and Schema Information: DefaultCatalog: 'toy_store' Catalogs: {'master', 'model', 'msdb' ... and 2 more} Schemas: {'db_accessadmin', 'db_backupoperator', 'db_datareader' ... and 15 more} Database and Driver Information: DatabaseProductName: 'Microsoft SQL Server' DatabaseProductVersion: '11.00.2100' DriverName: 'Microsoft JDBC Driver 4.0 ...' DriverVersion: '4.0.2206.100'
The Message
property is blank when the database connection is successful.
Preview the first eight records in the large data set returned by executing the SQL query in the DatabaseDatastore
object.
preview(dbds)
ans = 8×29 table Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay ____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ ________ _________________ ______________ _______ ________ ________ ______ _____ ________ ______ _______ _________ ________________ ________ ____________ ____________ ________ _____________ _________________ 1990 9 11 2 1810 1812 1939 1930 'AA' 1426 'NA' 89 78 'NA' 9 -2 'RST' 'ORD' 268 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 27 6 1353 1355 1634 1640 'US' 112 'NA' 161 165 'NA' -6 -2 'TPA' 'SYR' 1104 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 23 2 1057 1055 1205 1155 'US' 1621 'NA' 68 60 'NA' 10 2 'ROC' 'EWR' 246 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 8 1 1515 1440 1609 1535 'NW' 749 'NA' 54 55 'NA' 34 35 'MSP' 'FSD' 197 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 19 5 1130 1120 1203 1154 'UA' 369 'NA' 93 94 'NA' 9 10 'BUF' 'ORD' 473 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 1990 10 12 5 1755 1733 1858 1820 'DL' 590 'NA' 63 47 'NA' 38 22 'BOS' 'BGR' 201 'NA' 'NA' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 2001 11 22 4 1345 1355 1530 1549 'MQ' 4982 '#NAME?' 105 114 '90' -19 -10 'JAX' 'MIA' 334 '8' '7' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA' 2001 11 26 1 2105 2110 2209 2237 'AA' 1947 'N3BäA1' 64 87 '47' -28 -5 'SFO' 'LAX' 337 '6' '11' 0 'NA' 0 'NA' 'NA' 'NA' 'NA' 'NA'
Close the DatabaseDatastore
object and the database connection.
close(dbds)
DatabaseDatastore
Object Using Custom Import OptionsCustomize import options when importing a large data set from a database table. Control the import options by creating an SQLImportOptions
object. Then, customize the import options for database columns that contain logical data. Import and preview the data by creating a DatabaseDatastore
object and using the preview
function.
This example uses the airlinesmall_subset.xls
spreadsheet, which contains the column Cancelled
. Also, the example uses a Microsoft® SQL Server® Version 11.00.2100 database and the Microsoft SQL Server Driver 11.00.5058.
Create a database connection to a Microsoft SQL Server database with Windows® authentication. Specify a blank user name and password.
datasource = 'MS SQL Server Auth'; conn = database(datasource,'','');
Load flight information into the MATLAB® workspace.
flights = readtable('airlinesmall_subset.xlsx');
Create the flights
database table using the flight information.
tablename = 'flights';
sqlwrite(conn,tablename,flights)
Create an SQLImportOptions
object using the flights
database table with the databaseImportOptions
function.
opts = databaseImportOptions(conn,tablename);
Retrieve the default import options for the Cancelled
variable.
varnames = 'Cancelled';
varOpts = getoptions(opts,varnames)
varOpts = SQLVariableImportOptions with properties: Variable Properties : Name: 'Cancelled' Type: 'double' FillValue: NaN
Set the import options for the data type of the specified variable to logical
. Also, set the import options to replace missing data in the specified variable with the fill value true
.
opts = setoptions(opts,varnames,'Type','logical', ... 'FillValue',true);
Create the DatabaseDatastore
object to import a large data set using the import options.
dbds = databaseDatastore(conn,tablename,opts);
Import the logical data in the selected variable and display a preview of the data. The imported data shows that the variable has the logical
data type.
opts.SelectedVariableNames = varnames; data = preview(dbds); cancelled = data.Cancelled
cancelled = 8×1 logical array
0
0
0
0
0
0
0
0
Delete the flights
database table using the execute
function.
sqlquery = ['DROP TABLE ' tablename];
execute(conn,sqlquery)
Close the database connection.
close(conn)
DatabaseDatastore
Object Using Custom Import Options and Database Catalog and SchemaCustomize import options when importing a large data set from a database table. Control the import options by creating an SQLImportOptions
object. Then, customize the import options for database columns that contain logical data. Create a DatabaseDatastore
object using the specified database catalog and schema. Import the database data and preview it by using the preview
function with the DatabaseDatastore
object.
This example uses the airlinesmall_subset.xls
spreadsheet, which contains the column Cancelled
. Also, the example uses a Microsoft® SQL Server® Version 11.00.2100 database and the Microsoft SQL Server Driver 11.00.5058.
Create a database connection to a Microsoft SQL Server database with Windows® authentication. Specify a blank user name and password.
datasource = 'MS SQL Server Auth'; conn = database(datasource,'','');
Load flight information into the MATLAB® workspace.
flights = readtable('airlinesmall_subset.xlsx');
Create the flights
database table using the flight information and the toy_store
database catalog and dbo
database schema.
tablename = 'flights'; sqlwrite(conn,tablename,flights, ... 'Catalog','toy_store','Schema','dbo')
Create an SQLImportOptions
object using the flights
database table and the databaseImportOptions
function. Specify the toy_store
database catalog and dbo
database schema.
opts = databaseImportOptions(conn,tablename, ... 'Catalog','toy_store','Schema','dbo');
Retrieve the default import options for the Cancelled
variable.
varnames = 'Cancelled';
varOpts = getoptions(opts,varnames)
varOpts = SQLVariableImportOptions with properties: Variable Properties : Name: 'Cancelled' Type: 'double' FillValue: NaN
Set the import options for the data type of the specified variable to logical
. Also, set the import options to replace missing data in the specified variable with the fill value true
.
opts = setoptions(opts,varnames,'Type','logical', ... 'FillValue',true);
Create the DatabaseDatastore
object to import a large data set using import options, the toy_store
database catalog, and the dbo
database schema.
dbds = databaseDatastore(conn,tablename,opts, ... 'Catalog','toy_store','Schema','dbo');
Import the logical data in the selected variable and display a preview of the data. The imported data shows that the variable has the logical
data type.
opts.SelectedVariableNames = varnames; data = preview(dbds); cancelled = data.Cancelled
cancelled = 8×1 logical array
0
0
0
0
0
0
0
0
Delete the flights
database table from the toy_store
database catalog and the dbo
database schema by using the execute
function.
sqlquery = ['DROP TABLE toy_store.dbo.' tablename];
execute(conn,sqlquery)
Close the database connection.
close(conn)
database
| databaseImportOptions
| execute
| fetch
| getoptions
| preview
| reset
| setoptions
| sqlread
You have a modified version of this example. Do you want to open this example with your edits?