mapreduce
mapreduce
on Spark® and Hadoop®
clusters, and parallel poolsYou can use Parallel Computing Toolbox™ to evaluate tall-array expressions in parallel using a parallel pool on your desktop. Using tall arrays allows you to run big data applications that do not fit in memory on your machine. You can also use Parallel Computing Toolbox to scale up tall-array processing by connecting to a parallel pool running on a MATLAB Parallel Server™ cluster. Alternatively, you can use a Spark enabled Hadoop cluster running MATLAB Parallel Server. For more information, see Big Data Workflow Using Tall Arrays and Datastores.
Big Data Workflow Using Tall Arrays and Datastores
Learn about typical workflows using tall arrays to analyze big data sets.
Use Tall Arrays on a Parallel Pool
Discover tall arrays in Parallel Computing Toolbox and MATLAB Parallel Server.
This example shows how to access a large data set in the cloud and process it in a cloud cluster using MATLAB capabilities for big data.
Use Tall Arrays on a Spark Enabled Hadoop Cluster
Create and use tall tables on Spark clusters without changing your MATLAB code.
Run mapreduce on a Parallel Pool
Try mapreduce
for advanced analysis of big data using
Parallel Computing Toolbox.
Run mapreduce on a Hadoop Cluster
Learn about mapreduce
for advanced big data analysis on a
Hadoop cluster.
Partition a Datastore in Parallel
Use partition
to split your
datastore
into smaller parts.
Learn about starting and stopping parallel pools, pool size, and cluster selection.
Specify Your Parallel Preferences
Specify your preferences, and automatically create a parallel pool.
Discover Clusters and Use Cluster Profiles
Find out how to work with cluster profiles and discover cloud clusters running on Amazon EC2.