Large data sets can be in the form of large files that do not fit into available memory or files that take a long time to process. A large data set also can be a collection of numerous small files. There is no single approach to working with large data sets, so MATLAB® includes a number of tools for accessing and processing large data.
Begin by creating a datastore that can access small portions of the data at a
time. You can use the datastore to manage incremental import of the data. To
analyze the data using common MATLAB functions, such as mean
and
histogram
, create a tall array on top of the datastore.
For more complex problems, you can write a MapReduce algorithm that defines the
chunking and reduction of the data.