Working with Big Data

This example shows how Simulink models handle big data as input to and output from a simulation.

Open the Example Model

Open the example model.

Description of the Example

Big data refers to data that is too large to load into system memory all at once.

Simulink can produce big data as simulation output and consume big data as simulation input. To handle big data for both input and output, the entire data is stored in a MAT-file on the hard disk. Only small chunks of this data are loaded into system memory at any time during simulation. This approach is known as streaming. Simulink can stream data to and from a MAT-file. Streaming solves the memory issues because the capacity of the hard disk is typically much greater than the capacity of the random access memory.

This example shows how to handle big data in Simulink simulations. The logging to file capability is used to stream big data as the output of a simulation. Streaming from file then supplies big data as input to a simulation.

Set up Logging to File

To stream output data to a MAT-file, enable logging to file.

Enable logging to file by selecting the ''Configuration Parameters > Data Import/Export > Log Dataset data to file'' settings checkbox. Click here to open the Configuration Parameters Dialog. You can also specify the name of the file that will contain the result.

The programmatic way to enable Logging to file is by setting the model parameter LoggingToFile to on.

When logging to file is enabled on a model, simulation of that model streams logged signals directly into the MAT-file. Additionally, if logging of States or Output is enabled and SaveFormat is specified as Dataset, those values are streamed into same the MAT-file.

Simulate the Model

This example changes the directory to a temporary directory, which has write permissions. Then the example calls the sim command to simulate the model, logging to file.

Set the parameter SignalLoggingName, which specifies the name of the Dataset object to hold the result of signal logging, to topOut. Set the parameter LoggingFileName, which specifies the name of the resulting MAT-file, to top.mat. The StopTime parameter is set to 5000 seconds. For a more realistic big data example, the stop time would be a much larger value, which would result in many more data samples to log.

Create a DatasetRef Object to Reference the Logged Dataset Within the MAT-file

Use a DatasetRef object to reference the resulting Dataset in the logged MAT-file. The benefit of using DatasetRef is that the referenced MAT-file is not loaded into memory. DatasetRef is a very light wrapper object for referencing a Dataset that is stored in a file. The alternative of calling the load function on this file loads the entire file into memory, which might not be possible if this Dataset contains big data.

Obtain a Reference to a Logged Signal

You can use { } indexing of DatasetRef object to reference individual signals within a Dataset, without loading these signals into memory. For example, to reference the seconds signal:

The Values field of sig2 is a SimulationDatastore object, which is a light-weight reference to the data of signal 2, stored on disk:

ans = 

  SimulationDatastore with properties:

      ReadSize: 100
    NumSamples: 50001
      FileName: '/tmp/BR2020bd_1444674_32127/publish_examples4/top.mat'

    Data Preview:

     Time       Data 
    _______    ______

    0 sec      1    5
    0.1 sec    1    5
    0.2 sec    2    6
    0.3 sec    2    6
    0.4 sec    3    7
    :          :

Obtain More References to Other Logged Signals

This example uses some of these logged signals as inputs to the simulation of the referenced model. Create light-weight references for each of these. These are bus signals in the model and the resulting Values fields are structures of SimulationDatastore objects. Each structure reflects the hierarchy of the original bus signal.

Create a New Dataset Object to Use as Simulation Input

Specify the input signals to a simulation through a Dataset object. Each element in this Dataset provides input data to the inport block corresponding to the same index. Create an empty Dataset ds and then place the references to the logged signals into it as elements number one and two.

Use { } indexing on the Dataset object to assign elements into appropriate positions.

Within each element of the Dataset, you can mix references to signal data (e.g., SimulationDatastore object) with in-memory data (e.g., timeseries objects). To change one of the upper saturation limits from 30 to 37:

Stream Input Data into Simulation

Now simulate the referenced model sldemo_mdlref_counter_bus, and use the Dataset ds as input. The data that is referenced by SimulationDatastore objects is streamed into the simulation without overwhelming the system.

The data for upper saturation limit is not streamed because that signal is specified as an in-memory timeseries. The change in saturation limit is reflected at around time 6 in the scope (the signal now saturates to a value of 37 instead of 30).

Summary

This example has demonstrated a round trip workflow of big data from and to simulation. Logging to persistent storage was used to stream data from the first simulation into a MAT-file. A second simulation was then set up to stream the data from that file as input. A more realistic example would have a larger value for the model StopTime parameter, resulting in a larger logged MAT-file. The second simulation could also be configured for a longer StopTime. However, even with the larger data files for output and input, the memory requirements for the longer simulations remain the same!

MATLAB Workflow

SimulationDatastore allows you to analyze the logged data incrementally in MATLAB. Going back to the reference to the second logged signal, assign the datastore to a new variable to simplify access to it.

Access the Data in Chunks

SimulationDatastore allows incremental reading of the referenced data. The reading is done in chunks and is controlled by the ReadSize property. The default value for ReadSize is 100 samples (each sample for a signal is the data logged for a single time step of simulation). Change it to 1000 for this example. Each read of the datastore returns a timetable representation of the data.

Reset the SimulationDatastore Read Counter

Each read on the datastore advances the read counter. You can reset this counter and start reading from the beginning:

Iterate Through All Data in the Datastore

Use SimulationDatastore for incremental access to the logged simulation data for big data analysis in MATLAB. You can iterate over the entire data record and chunks:

Exit

Close the model.

Related Documentation

Click here for more information on the SimulationDatastore class.