Programming technique for analyzing data sets that do not fit in memory
optionally
specifies the run-time configuration settings for outds
= mapreduce(ds
,mapfun
,reducefun
,mr
)mapreduce
.
The mr
input is the result of a call to the mapreducer
function.
Typically, this argument is used with Parallel Computing Toolbox™, MATLAB®
Parallel Server™,
or MATLAB
Compiler™. For more information, see Speed Up and Deploy MapReduce Using Other Products.
specifies
additional options with one or more outds
= mapreduce(___,Name,Value
)Name,Value
pair
arguments using any of the previous syntaxes. For example, you can
specify 'OutputFolder'
followed by a character
vector specifying a path to the output folder.
Debugging your mapreduce
algorithms
to examine how key-value pairs move through the different phases is
always useful. To examine the movement of data, set breakpoints in
your map and reduce functions. The breakpoints stop execution of mapreduce
,
allowing you to examine the current status of relevant variables,
like the KeyValueStore
or ValueIterator
.
For more information, see Debug MapReduce Algorithms.
Some recommendations to optimize mapreduce
performance
on any platform are:
Minimize the number of calls to the map function. The easiest approach
is to increase the value of the ReadSize
property of
the input datastore. The result is that mapreduce
passes larger blocks of data to the map function, and the datastore
depletes with fewer reads.
Decrease the amount of intermediate data sent between
map and reduce functions. One approach is to use unique
inside
a map function to combine similar keys. See Compute Mean by Group Using MapReduce for an example
of this technique.
datastore
| gcmr
| KeyValueStore
| mapreducer
| tall
| ValueIterator