This example shows how to explore the structure of an Apache Cassandra® database and import data from a Cassandra database table into MATLAB® using a Cassandra database connection. The Cassandra database stores database tables according to the partition key. The partition key affects how the data is filtered in the database.
In this example, the Cassandra database contains the employees_by_job
database table with employee data and the job_id
partition key.
To run this example, you must first install the Database Toolbox™ Interface for Apache Cassandra® Database. For details, see Database Toolbox Interface for Apache Cassandra Database Installation.
Create a Cassandra database connection using the local host address. conn
is a cassandra
object.
contactPoints = "localhost";
conn = cassandra(contactPoints);
View the keyspaces in the Cassandra database using the Cassandra database connection. The returned string array contains the keyspaces. A keyspace contains one or more database tables and defines how the database replicates the data in the tables.
conn.Keyspaces
ans = 1×6 string array
"employeedata" "system" "system_auth" "system_distributed" "system_schema" "system_traces"
Return the names of the Cassandra database tables in the employeedata
keyspace. t
is a string array that contains the names of the database tables in the employeedata
keyspace.
keyspace = "employeedata";
t = tablenames(conn,keyspace)
t = 3×1 string array
"employees_by_job"
"employees_by_id"
"employees_by_name"
Return the names of the Cassandra database columns in the employees_by_job
database table.
tablename = "employees_by_job";
cols = columninfo(conn,keyspace,tablename);
Display the first few names of the Cassandra database columns in the employees_by_job
database table.
head(cols)
ans=8×4 table
Name DataType PartitionKey ClusteringColumn
________________ ________ ____________ ________________
"job_id" "text" true ""
"hire_date" "date" false "DESC"
"employee_id" "int" false "ASC"
"commission_pct" "double" false ""
"department_id" "int" false ""
"email" "text" false ""
"first_name" "text" false ""
"last_name" "text" false ""
cols
is a table with these variables:
Name
— Cassandra database column name
DataType
— Cassandra Query Language (CQL) data type of the Cassandra database column
PartitionKey
— Partition key indicator
ClusteringColumn
— Clustering column indicator
The value in the PartitionKey
variable indicates whether the database column is a partition key. The column job_id
(job identifier) is a partition key in this database table.
Import data from the employees_by_job
database table into MATLAB using the partition key value for shop clerk, SH_CLERK
.
keyValue = "SH_CLERK"; results = partitionRead(conn,keyspace,tablename, ... keyValue);
Display the first few rows of the returned employee data.
head(results)
ans=8×13 table
job_id hire_date employee_id commission_pct department_id email first_name last_name manager_id office performance_ratings phone_number salary
building room
__________ ___________ ___________ ______________ _____________ __________ __________ __________ __________ ________________ ___________________ ______________ ______
"SH_CLERK" 03-Feb-2008 183 NaN 50 "GGEONI" "Girard" "Geoni" 120 "West" 121 [ 2] "650.507.9879" 2800
"SH_CLERK" 13-Jan-2008 199 NaN 50 "DGRANT" "Douglas" "Grant" 124 "West" 139 [3×1 int32] "650.507.9844" 2600
"SH_CLERK" 19-Dec-2007 191 NaN 50 "RPERKINS" "Randall" "Perkins" 122 "West" 255 [ 2] "650.505.4876" 2500
"SH_CLERK" 21-Jun-2007 182 NaN 50 "MSULLIVA" "Martha" "Sullivan" 120 "East" 112 [2×1 int32] "650.507.9878" 2500
"SH_CLERK" 21-Jun-2007 198 NaN 50 "DOCONNEL" "Donald" "OConnell" 124 "East" 125 [3×1 int32] "650.507.9833" 2600
"SH_CLERK" 17-Mar-2007 195 NaN 50 "VJONES" "Vance" "Jones" 123 "West" 236 [5×1 int32] "650.501.4876" 2800
"SH_CLERK" 07-Feb-2007 187 NaN 50 "ACABRIO" "Anthony" "Cabrio" 121 "West" 345 [3×1 int32] "650.509.4876" 3000
"SH_CLERK" 11-Jul-2006 190 NaN 50 "TGATES" "Timothy" "Gates" 122 "East" 297 [3×1 int32] "650.505.3876" 2900
results
is a table that contains these variables:
job_id
— Job identifier
hire_date
— Hire date
employee_id
— Employee identifier
commission_pct
— Commission percentage
department_id
— Department identifier
email
— Email address
first_name
— First name
last_name
— Last name
manager_id
— Manager identifier
office
— Office location (table that contains two variables for the building and room)
performance_ratings
— Performance ratings
phone_number
— Phone number
salary
— Salary
close(conn)
cassandra
| close
| columninfo
| partitionRead
| tablenames