Dataset Arrays in the Variables Editor

Note

The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Open Dataset Arrays in the Variables Editor

The MATLAB Variables editor provides a convenient interface for viewing, modifying, and plotting dataset arrays.

First, load the sample data set, hospital.

load hospital
The dataset array, hospital, is created in the MATLAB workspace.

The dataset array has 100 observations and 7 variables.

To open hospital in the Variables editor, click Open Variable, and select hospital.

The Variables editor opens, displaying the contents of the dataset array (only the first 10 observations are shown here).

In the Variables editor, you can see the names of the seven variables along the top row, and the observations names down the first column.

Modify Variable and Observation Names

You can modify variable and observation names by double-clicking a name, and then typing new text.

All changes made in the Variables editor are also sent to the command line.

The sixth variable in the data set, BloodPressure, is a numeric array with two columns. The first column shows systolic blood pressure, and the second column shows diastolic blood pressure. Click the arrow that appears on the right side of the variable name cell to see the units and description of the variable. You can type directly in the units and description fields to modify the text. The variable data type and size are shown under the variable description.

Reorder or Delete Variables

You can reorder variables in a dataset array using the Variables editor. Hover over the left side of a variable name cell until a four-headed arrow appears.

After the arrow appears, click and drag the variable column to a new location.

The command for the variable reordering appears in the command line.

You can delete a variable in the Variables editor by selecting the variable column, right-clicking, and selecting Delete Column Variable(s).

The command for the variable deletion appears in the command line.

Add New Data

You can enter new data values directly into the Variables editor. For example, you can add a new patient observation to the hospital data set. To enter a new last name, add a character vector to the end of the variable LastName.

The variable Gender is a nominal array. The levels of the categorical variable appear in a drop-down list when you double-click a cell in the Gender column. You can choose one of the levels previously used, or create a new level by selecting New Item.

You can continue to add data for the remaining variables.

To change the observation name, click the observation name and type the new name.

The commands for entering the new data appear at the command line.

Notice the warning that appears after the first assignment. When you enter the first piece of data in the new observation row—here, the last name—default values are assigned to all other variables. Default assignments are:

  • 0 for numeric variables

  • <undefined> for categorical variables

  • [] for cell arrays

You can also copy and paste data from one dataset array to another using the Variables editor.

Sort Observations

You can use the Variables editor to sort dataset array observations by the values of one or more variables. To sort by gender, for example, select the variable Gender. Then click Sort, and choose to sort rows by ascending or descending values of the selected variable.

When sorting by variables that are cell arrays of character vectors or of nominal data type, observations are sorted alphabetically. For ordinal variables, rows are sorted by the ordering of the levels. For example, when the observations of hospital are sorted by the values in Gender, the females are grouped together, followed by the males.

To sort by the values of multiple variables, press Ctrl while you select multiple variables.

When you use the Variables editor to sort rows, it is the same as calling sortrows. You can see this at the command line after executing the sorting.

Select a Subset of Data

You can select a subset of data from a dataset array in the Variables editor, and create a new dataset array from the selection. For example, to create a dataset array containing only the variables LastName and Age:

  1. Hold Ctrl while you click the variables LastName and Age.

  2. Right-click, and select New Workspace Variable from Selection > New Dataset Array.

The new dataset array appears in the Workspace window with the name hospital1. The Command Window shows the commands that execute the selection.

You can use the same steps to select any subset of data. To select observations according to some logical condition, you can use a combination of sorting and selecting. For example, to create a new dataset array containing only males aged 45 and older:

  1. Sort the observations of hospital by the values in Gender and Age, descending.

  2. Select the male observations with age 45 and older.

  3. Right-click, and select New Workspace Variables from Selection > New Dataset Array. The new dataset array, hospital2, is created in the Workspace window.

  4. You can rename the dataset array in the Workspace window.

Create Plots

You can plot data from a dataset array using plotting options in the Variables editor. Available plot choices depend on the data types of variables to be plotted.

For example, if you select the variable Age, you can see in the Plots tab some plotting options that are appropriate for a univariate, numeric variable.

Sometimes, there are plot options for multiple variables, depending on their data types. For example, if you select both Age and Gender, you can draw box plots of age, grouped by gender.

See Also

|

Related Examples

More About