tabularTextDatastore

Datastore for tabular text files

Description

Use a TabularTextDatastore object to manage large collections of text files containing column-oriented or tabular data where the collection does not necessarily fit in memory. Tabular data is data that is arranged in a rectangular fashion with each row having the same number of entries. You can create a TabularTextDatastore object using the tabularTextDatastore function, specify its properties, and then import and process the data using object functions.

Creation

Syntax

ttds = tabularTextDatastore(location)

ttds = tabularTextDatastore(location,Name,Value)

Description

ttds = tabularTextDatastore(location) creates a datastore from the collection of data specified by location.

example

ttds = tabularTextDatastore(location,Name,Value) specifies additional parameters and properties for ttds using one or more name-value pair arguments. For example, tabularTextDatastore(location,'FileExtensions',{'.txt','.csv'}) creates a datastore from only the files in location with extensions .txt and .csv.

Input Arguments

expand all

`location` — Files or folders included in datastore
path | `DsFileSet` object

Files or folders included in the datastore, specified as a path or a DsFileSet object.

path — Specify the path as a character vector, cell array of character vectors, string scalar, or a string array, containing the location of files or folders that are local or remote.
- Local files or folders — Specify location as a local path to files or folders. If the files are not in the current folder, then local path must specify full or relative paths. Files within subfolders of the specified folder are not automatically included in the datastore. You can use the wildcard character (*) when specifying the local path. This character specifies that the datastore include all matching files or all files in the matching folders.
- Remote files or folders — Specify location to be the full paths of the files or folders as a uniform resource locator (URL) of the form hdfs:///path_to_file. For more information, see Work with Remote Data.
DsFileSet object — You also can specify location as a DsFileSet object. For more information, see matlab.io.datastore.DsFileSet.

When location represents a folder, the datastore includes only supported file formats and ignores any other format. To specify a custom list of file extensions to include in your datastore, see the FileExtensions property.

The tabularTextDatastore function supports these extensions: .txt, .csv, .dat, .dlm, .asc, .text, or no extension.

Example: 'file1.csv'

Example: '../dir/data/file1'

Example: {'C:\dir\data\file1.csv','C:\dir\data\file2.dat'}

Example: 'C:\dir\data\*.text'

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example:

ttds =
                        tabularTextDatastore('C:\dir\textdata','FileExtensions',{'.csv','.txt'})

`'FileExtensions'` — Text file extensions
character vector | cell array of character vectors | string scalar | string array

Text file extensions, specified as the comma-separated pair consisting of 'FileExtensions' and a character vector, cell array of character vectors, string scalar, or string array. The specified extensions do not require a supported format. If you want to include unsupported extensions, then specify all extensions. Use empty quotes '' to represent files without extensions.

Example: 'FileExtensions','.txt'

Example: 'FileExtensions',{'.text','.csv'}

Data Types: char | cell | string

`'IncludeSubfolders'` — Subfolder inclusion flag
`true` or `false` | 0 or 1

Subfolder inclusion flag, specified as the comma-separated pair consisting of 'IncludeSubfolders' and true, false, 0, or 1. Specify true to include all files and subfolders within each folder or false to include only the files within each folder.

When you do not specify 'IncludeSubfolders', then the default value is false.

Example: 'IncludeSubfolders',true

Data Types: logical | double

`'OutputType'` — Output datatype
`'auto'` (default) | `'table'` | `'timetable'`

Output datatype, specified as the comma-separated pair consisting of 'OutputType' and one of these values:

'auto' — Detects if the output from the datastore should be a table or a timetable based on whether you specify the 'RowTimes' name-value pair. If you specify 'RowTimes' then the output is a timetable; otherwise, the output is a table.
'table' — Return a table.
'timetable' — Return a timetable.

The value of OutputType determines the data type returned by the preview, read, and readall functions. Use this option in conjunction with the 'RowTimes' name-value pair to return timetables from TabularTextDatastore.

Example: 'OutputType','timetable'

Data Types: char | string

`'AlternateFileSystemRoots'` — Alternate file system root paths
string vector | cell array

Alternate file system root paths, specified as the comma-separated pair consisting of 'AlternateFileSystemRoots' and a string vector or a cell array. Use 'AlternateFileSystemRoots' when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB^® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use 'AlternateFileSystemRoots' to associate the root paths.

To associate a set of root paths that are equivalent to one another, specify 'AlternateFileSystemRoots' as a string vector. For example,
```
["Z:\datasets","/mynetwork/datasets"]
```
To associate multiple sets of root paths that are equivalent for the datastore, specify 'AlternateFileSystemRoots' as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:
- Specify 'AlternateFileSystemRoots' as a cell array of string vectors.
```
{["Z:\datasets", "/mynetwork/datasets"];...
 ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}
```
- Alternatively, specify 'AlternateFileSystemRoots' as a cell array of cell array of character vectors.
```
{{'Z:\datasets','/mynetwork/datasets'};...
 {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}
```

The value of 'AlternateFileSystemRoots' must satisfy these conditions:

Contains one or more rows, where each row specifies a set of equivalent root paths.
Each row specifies multiple root paths and each root path must contain at least two characters.
Root paths are unique and are not subfolders of one another.
Contains at least one root path entry that points to the location of the files.

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

`'TextType'` — Output data type of text variables
`'char'` (default) | `'string'`

Output data type of text variables, specified as the comma-separated pair consisting of 'TextType' and either 'char' or 'string'. If the output table from the read, readall, or preview functions contains text variables, then 'TextType' specifies the data type of those variables for TabularTextDatastore. If 'TextType' is 'char', then the output is a cell array of character vectors. If 'TextType' is 'string', then the output has type string.

Data Types: char | string

`'DatetimeType'` — Type for imported date and time data
`'datetime'` (default) | `'text'`

Type for imported date and time data, specified as the comma-separated pair consisting of 'DatetimeType' and one of these values: 'datetime' or 'text'.

Value Type for Imported Date and Time Data

Value	Type for Imported Date and Time Data
`'datetime'`	MATLAB `datetime` data type For more information, see `datetime`.
`'text'`	If `'DatetimeType'` is specified as `'text'`, then the type for imported date and time data depends on the value specified in the `'TextType'` property: If `'TextType'` is `'char'`, then the `tabularTextdatastore` imports dates as a cell array of character vectors. If `'TextType'` is `'string'`, then the `tabularTextdatastore` imports dates as an array of strings.

'datetime'

MATLAB datetime data type

For more information, see datetime.

'text'

If 'DatetimeType' is specified as 'text', then the type for imported date and time data depends on the value specified in the 'TextType' property:

If 'TextType' is 'char', then the tabularTextdatastore imports dates as a cell array of character vectors.
If 'TextType' is 'string', then the tabularTextdatastore imports dates as an array of strings.

If the specified TextscanFormats property contains a %D, then the tabularTextdatastore ignores the value specified in DatetimeType.

Example: 'DatetimeType','datetime'

Data Types: char | string

`'DurationType'` — Output data type of duration data
`'duration'` (default) | `'text'`

Output data type of duration data, specified as the comma-separated pair consisting of 'DurationType' and either 'duration' or 'text'.

Value Type for Imported Duration Data

Value	Type for Imported Duration Data
`'duration'`	MATLAB `duration` data type For more information, see `duration`.
`'text'`	If `'DurationType'` is specified as `'text'`, then the type for imported duration data depends on the value specified in the `'TextType'` parameter: If `'TextType'` is `'char'`, then the importing function returns duration data as a cell array of character vectors. If `'TextType'` is `'string'`, then the importing function returns duration data as an array of strings.

'duration'

MATLAB duration data type

For more information, see duration.

'text'

If 'DurationType' is specified as 'text', then the type for imported duration data depends on the value specified in the 'TextType' parameter:

If 'TextType' is 'char', then the importing function returns duration data as a cell array of character vectors.
If 'TextType' is 'string', then the importing function returns duration data as an array of strings.

Data Types: char | string

`'DatetimeLocale'` — Locale to interpret dates
`'en_US'` (default) | character vector | string scalar

Locale to interpret dates, specified as a character vector or string scalar. The DatetimeLocale value determines how the importing function interprets text that represents dates and times.

When specifying the DatetimeLocale, use the form xx_YY, where xx is a lowercase ISO 639-1 two-letter code that specifies a language, and YY is an uppercase ISO 3166-1 alpha-2 code that specifies a country.

This table lists some common values for the locale.

Locale	Language	Country
`'de_DE'`	German	Germany
`'en_GB'`	English	United Kingdom
`'en_US'`	English	United States
`'es_ES'`	Spanish	Spain
`'fr_FR'`	French	France
`'it_IT'`	Italian	Italy
`'ja_JP'`	Japanese	Japan
`'ko_KR'`	Korean	Korea
`'nl_NL'`	Dutch	Netherlands
`'zh_CN'`	Chinese (simplified)	China

Note

The Locale value determines how input values are interpreted. The display format and language is specified by the Locale option in the Datetime format section of the Preferences panel. To change the default datetime locale, see Set Command Window Preferences.

Data Types: char | string

In addition to these name-value pairs, you also can specify the properties on this page as name-value pairs, with the exception of the Files property.

Properties

expand all

TabularTextDatastore properties describe the files associated with a TabularTextDatastore object. Specifically, the properties describe the format of the data in the files and control how the data should be read from the datastore. When you create a TabularTextDatastore object, the datastore function uses the first file in the Files property to determine the values of the properties. With the exception of the Files property, you can specify the value of TabularTextDatastore properties using name-value pair arguments when you create the datastore object. To view or modify a property after creating the object, use the dot notation:

ds = tabularTextDatastore('airlinesmall.csv');
ds.TreatAsMissing = 'NA';
ds.MissingValue = 0;

File Properties

`Files` — Files included in datastore
cell array of character vectors | string array

Files included in the datastore, resolved as a cell array of character vectors or a string array, where each character vector or string is a full path to a file. The location argument in the tabularTextDatastore and datastore functions define these files.

The first file specified by the Files property determines the variable names and format information for all files in the datastore.

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Example: {'C:\dir\data\mydata1.csv';'C:\dir\data\mydata2.csv'}

Data Types: cell | string

`FileEncoding` — File encoding
`'UTF-8'` (default) | `'US-ASCII'` | `'Macintosh'` | ...

File encoding, specified as a character vector or a string scalar like one of these values.

`'IBM866'`	`'ISO-8859-1'`	`'windows-847'`
`'KOI8-R'`	`'ISO-8859-2'`	`'windows-1250'`
`'KOI8-U'`	`'ISO-8859-3'`	`'windows-1251'`
`'Macintosh'`	`'ISO-8859-4'`	`'windows-1252'`
`'US-ASCII'`	`'ISO-8859-5'`	`'windows-1253'`
`'UTF-8'`	`'ISO-8859-6'`	`'windows-1254'`
	`'ISO-8859-7'`	`'windows-1255'`
	`'ISO-8859-8'`	`'windows-1256'`
	`'ISO-8859-9'`	`'windows-1257'`
	`'ISO-8859-11'`	`'windows-1258'`
	`'ISO-8859-13'`
	`'ISO-8859-15'`

If each file in the datastore fits into memory, then FileEncoding also can be one of these values.

`'Big5'`	`'EUC-KR'`	`'GB18030'`	`'Shift_JIS'`
`'Big5-HKSCS'`	`'EUC-JP'`	`'GB2312'`	`'windows-949'`
`'CP949'`	`'EUC-TW'`	`'GBK'`

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Data Types: char | string

`ReadVariableNames` — Read variable names
`true` | `false`

Read variable names, specified as a logical true or false.

If unspecified, the tabularTextDatastore function detects the presence of variable names automatically.
If true, then the first nonheader row of the first file determines the variable names for the data.
If false, then the first nonheader row of the first file contains the first row of data. The data is assigned default variable names, Var1, Var2, and so on.

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Data Types: logical

`VariableNamingRule` — Flag to preserve variable names
`'modify'` (default) | `'preserve'`

Flag to preserve variable names, specified as the comma-separated pair consisting of VariableNamingRule and either true, or false.

'preserve' — Preserve variable names that are not valid MATLAB identifiers such as variable names that include spaces and non-ASCII characters.
'modify' — Convert invalid variable names (as determined by the isvarname function) to valid MATLAB identifiers.

Starting in R2019b, variable names and row names can include any characters, including spaces and non-ASCII characters. Also, they can start with any characters, not just letters. Variable and row names do not have to be valid MATLAB identifiers (as determined by the isvarname function). To preserve these variable names and row names, set the value of VariableNamingRule to 'preserve'.

Data Types: char | string

`VariableNames` — Names of variables
cell array of character vectors | string array

Names of variables in the datastore, specified as a cell array of character vectors or a string array. Specify the variable names in the order in which they appear in the files. If you do not specify the variable names, they are detected from the first nonheader line in the first file of the datastore. When modifying the VariableNames property, the number of new variable names must match the number of original variable names.

To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of the VariableNamingRule parameter to 'preserve'.

If ReadVariableNames is false, then VariableNames defaults to {'Var1','Var2', ...}.

Example: {'Time','Name','Quantity'}

Data Types: cell | string

Text Format Properties

`NumHeaderLines` — Number of lines to skip
non-negative integer

Number of lines to skip at the beginning of the file, specified as a nonnegative integer. If unspecified, the tabularTextDatastore function detects the number of lines to skip automatically.

The tabularTextDatastore function ignores the specified number of header lines before reading the variable names or data.

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Data Types: double

`Delimiter` — Field delimiter characters
character vector | cell array of character vectors | string scalar | string array

Field delimiter characters, specified as a character vector, cell array of character vectors, string scalar, or string array. Specify multiple delimiters in a cell array of character vectors or a string array. If unspecified, the tabularTextDatastore function detects the delimiter automatically.

Example: '|'

Example: {';','*'}

Repeated delimiter characters in a file are interpreted as separate delimiters with empty fields between them. If unspecified, the read function detects the delimiter automatically by default.

When you specify one of the following escape sequences as a delimiter, it is converted to the corresponding control character.

`\b`	Backspace
`\n`	Newline
`\r`	Carriage return
`\t`	Tab
`\\`	Backslash (`\`)

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Data Types: char | cell | string

`RowDelimiter` — Row delimiter character
`\r\n` (default) | character vector | string scalar

Row delimiter character, specified as a character vector or string scalar that must be either a single character or one of '\r', '\n', or '\r\n'.

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Example: ':'

Data Types: char | string

`TreatAsMissing` — Text to treat as missing values
`''` (default) | character vector | cell array of character vectors | string scalar | string array

Text to treat as missing values, specified as a single character vector, cell array of character vectors, string scalar, or string array. Values specified as TreatAsMissing are substituted with the value defined in the MissingValue property. For instance, if MissingValue is defined to be a NaN, and the TreatAsMissing is specified as 'NA'. Then, in the imported data, all occurrences of 'NA' are replaced by NaN.

This option only applies to numeric fields. Also, this property is equivalent to the TreatAsEmpty name-value pair argument for the textscan function.

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Example: 'NA'

Example: {'-',''}

Data Types: char | cell | string

`MissingValue` — Value for missing numeric fields
`NaN` (default) | scalar

Value for missing numeric fields in delimited text files, specified as a scalar. This property is equivalent to the EmptyValue name-value pair argument for the textscan function.

Data Types: double

Advanced Text Format Properties

`TextscanFormats` — Data field format
cell array of character vectors | string array

Data field format, specified as a cell array of character vectors or a string array, where each character vector or string contains one conversion specifier.

When you specify or modify the TextscanFormats property, you can use the same conversion specifiers that the textscan function accepts for the formatSpec argument. Valid values for TextscanFormats include conversion specifiers that skip fields using an asterisk (*) character and ones that skip literal text. The number of conversion specifiers must match the number of variables in the VariableNames property.

If the value of TextscanFormats includes conversion specifiers that skip fields using asterisk characters (*), then the value of the SelectedVariableNames property automatically updates. MATLAB uses the %*q conversion specifier to skip fields omitted by the SelectedVariableNames property and treats the field contents as literal character vectors. For fixed-width files, indicate a skipped field using the appropriate conversion specifier along with the field width. For example, %*52c skips a field that contains 52 characters.
If you do not specify a value for TextscanFormats, then datastore determines the format of the data fields by scanning text from the first nonheader line in the first file of the datastore.
Starting in R2020b, datastore detects prefixed literals as hexadecimal and binary data. Previously, datastore detected prefixed literals as text data.

Example: {'%s','%s','%f'}

Data Types: cell | string

`ExponentCharacters` — Exponent characters
`'eEdD'` (default) | character vector | string scalar

Exponent characters, specified as a character vector or string scalar. The default exponent characters are e, E, d, and D.

Data Types: char | string

`CommentStyle` — Comment character
`''` (default) | character vector | string scalar | two-element array

Comment character used to distinguish comments in the file, specified as character vector, string scalar, or two-element string or cell array.

If you specify a character vector or string scalar, then all following text on the same line are interpreted as a comment. For example, 'CommentStyle','/*' interprets all text after /* on the same line as a comment.
If you specify a two-element string vector or a two-element cell array containing character vectors, then all text between the two characters are interpreted as a comment. For example, 'CommentStyle',{'/*', '*/'} interprets all text between '/*' and '*/' as a comment.

When reading from a TabularTextDatastore, the read function checks for comments only at the start of each field, not within a field.

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Example: 'CommentStyle',{'/*', '*/'}

Data Types: char | cell | string

`Whitespace` — White-space characters
`' \b\t'` (default) | character vector | string scalar

White-space characters, specified as a character vector or a string scalar of one or more characters.

When you specify one of the following escape sequences as any white-space character, the datastore function converts that sequence to the corresponding control character.

`\b`	Backspace
`\n`	Newline
`\r`	Carriage return
`\t`	Tab
`\\`	Backslash (`\`)

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Example: ' \b\t'

Data Types: char | string

`MultipleDelimitersAsOne` — Multiple delimiter handling
`0 (false)` (default) | `1 (true)`

Multiple delimiter handling, specified as either true or false. If true, then datastore treats consecutive delimiters as a single delimiter. Repeated delimiters separated by white-space are also treated as a single delimiter.

When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore properties.

Properties for `preview`, `read`, `readall` Table

`SelectedVariableNames` — Variables to read
cell array of character vectors | string array

Variables to read from the file, specified as a cell array of character vectors or a string array, where each character vector or string contains the name of one variable. You can specify the variable names in any order.

To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of the VariableNamingRule parameter to 'preserve'.

Example: {'Var3','Var7','Var4'}

Data Types: cell | string

`SelectedFormats` — Formats of selected variables
cell array of character vectors | string array

Formats of the selected variables to read, specified as a cell array of character vectors or a string array, where each character vector or string contains one conversion specifier. The variables to read are indicated by the SelectedVariableNames property. The number of character vectors or strings in SelectedFormats must match the number of variables to read.

You can use the same conversion specifiers that the textscan function accepts, including specifiers that skip literal text. However, you cannot use a conversion specifier that skips a field. That is, the conversion specifier cannot include an asterisk character (*).

Example: {'%d','%d'}

Data Types: cell | string

`ReadSize` — Amount of data to read
20000 (default) | positive scalar | `'file'`

Amount of data to read in a call to the read function, specified as a positive scalar or 'file'.

If ReadSize is a positive integer, then each call to read reads at most ReadSize rows.
If ReadSize is 'file', then each call to read reads all of the data in one file.

When you change ReadSize from a numeric scalar to 'file' or vice versa, MATLAB resets the datastore to the state where no data has been read from it.

Data Types: double | char | string

`TextType` — Output data type of text variables
`'char'` (default) | `'string'`

Output data type of text variables, specified as 'char' or 'string'. TextType specifies the data type of text variables formatted with %s, %q, or [...].

If TextType is 'char', then the output is a cell array of character vectors.
If TextType is 'string', then the output has type string.

Data Types: char | string

`RowTimes` — Name of row times variable
variable name | variable index

Name of row times variable, specified as the comma-separated pair consisting of 'RowTimes' and a variable name (such as "Date") or a variable index (such as 3).

RowTimes is a timetable-related parameter. Each row of a timetable is associated with a time, which is captured in a time vector for the timetable. The variable specified in RowTimes must contain a datetime or a duration vector.

If the value of 'OutputType' is 'timetable', but you do not specify 'RowTimes', then TabularTextDatastore uses the first datetime or duration variable as the row times for the timetable.

Properties for use by `writeall`

`Folders` — Folders used to construct datastore
cell array of character vectors

This property is read-only.

Folders used to construct datastore, returned as a cell array of character vectors. The cell array is oriented as a column vector. Each character vector is a path to a folder that contains data files. The location argument in the tabularTextDatastore and datastore functions defines Folders when the datastore is created.

The Folders property is reset when you modify the Files property of a TabularTextDatastore object.

Data Types: cell

`SupportedOutputFormats` — List of formats supported for writing
row vector

This property is read-only.

List of formats supported for writing, returned as a row vector of strings. This property specifies the possible output formats when using writeall to write output files from the datastore.

Data Types: string

`DefaultOutputFormat` — Default output format
string scalar

This property is read-only.

Default output format, returned as a string scalar. This property specifies the default format when using writeall to write output files from the datastore.

Data Types: string

Object Functions

`hasdata`	Determine if data is available to read
`numpartitions`	Number of datastore partitions
`partition`	Partition a datastore
`preview`	Preview subset of data in datastore
`read`	Read data in datastore
`readall`	Read all data in datastore
`writeall`	Write datastore to files
`reset`	Reset datastore to initial state
`transform`	Transform datastore
`combine`	Combine data from multiple datastores
`isPartitionable`	Determine whether datastore is partitionable
`isShuffleable`	Determine whether datastore is shuffleable

Examples

collapse all

Select Variables to Read

Open Live Script

Create a datastore from the sample file airlinesmall.csv, which contains tabular data.

ds = tabularTextDatastore('airlinesmall.csv','TreatAsMissing','NA',...
    'MissingValue',0);

View the variables in the datastore.

ds.VariableNames

ans = 1x29 cell
  Columns 1 through 5

    {'Year'}    {'Month'}    {'DayofMonth'}    {'DayOfWeek'}    {'DepTime'}

  Columns 6 through 9

    {'CRSDepTime'}    {'ArrTime'}    {'CRSArrTime'}    {'UniqueCarrier'}

  Columns 10 through 13

    {'FlightNum'}    {'TailNum'}    {'ActualElapsedTime'}    {'CRSElapsedTime'}

  Columns 14 through 18

    {'AirTime'}    {'ArrDelay'}    {'DepDelay'}    {'Origin'}    {'Dest'}

  Columns 19 through 22

    {'Distance'}    {'TaxiIn'}    {'TaxiOut'}    {'Cancelled'}

  Columns 23 through 25

    {'CancellationCode'}    {'Diverted'}    {'CarrierDelay'}

  Columns 26 through 28

    {'WeatherDelay'}    {'NASDelay'}    {'SecurityDelay'}

  Column 29

    {'LateAircraftDelay'}

Modify the SelectedVariableNames property to specify the variables of interest.

ds.SelectedVariableNames = {'Year','Month','Cancelled'};

Alternatively, you can specify the variables of interest when you create the datastore.

ds = tabularTextDatastore('airlinesmall.csv','TreatAsMissing','NA',...
    'MissingValue',0,'SelectedVariableNames',{'Year','Month','Cancelled'});

Specify Format of Data to Read

Open Live Script

Create a datastore from the sample file airlinesmall.csv, which contains tabular data.

ds = tabularTextDatastore('airlinesmall.csv','TreatAsMissing','NA',...
    'MissingValue',0);

Specify the variables of interest.

ds.SelectedVariableNames = {'Year','Month','UniqueCarrier'};

View the SelectedFormats property.

ds.SelectedFormats

ans = 1x3 cell
    {'%f'}    {'%f'}    {'%q'}

The SelectedFormats property specifies how the tabularTextDatastore function interprets the format of the variables. The Year and Month variables are read as columns of floating-point values and the UniqueCarrier variable as a column of text.

Modify the SelectedFormats property to read the first two variables as signed integers and the third variable as a categorical value.

ds.SelectedFormats = {'%d','%d','%C'};

Preview the data.

T = preview(ds)

T=8×3 table
    Year    Month    UniqueCarrier
    ____    _____    _____________

    1987     10           PS      
    1987     10           PS      
    1987     10           PS      
    1987     10           PS      
    1987     10           PS      
    1987     10           PS      
    1987     10           PS      
    1987     10           PS

Return Timetable from Tabular Text Datastore

Open Live Script

Use the OutputType and RowTimes name-value pairs to make tabulartextDatastore return timetables instead of tables.

Create a datastore for outages.csv. Specify the 'OutputType' name-value pair as 'timetable'.

ttds = tabularTextDatastore('outages.csv','OutputType','timetable');
preview(ttds)

ans=8×5 timetable
       OutageTime          Region         Loss     Customers     RestorationTime            Cause       
    ________________    _____________    ______    __________    ________________    ___________________

    2002-02-01 12:18    {'SouthWest'}    458.98    1.8202e+06    2002-02-07 16:50    {'winter storm'   }
    2003-01-23 00:49    {'SouthEast'}    530.14    2.1204e+05                 NaT    {'winter storm'   }
    2003-02-07 21:15    {'SouthEast'}     289.4    1.4294e+05    2003-02-17 08:14    {'winter storm'   }
    2004-04-06 05:44    {'West'     }    434.81    3.4037e+05    2004-04-06 06:10    {'equipment fault'}
    2002-03-16 06:18    {'MidWest'  }    186.44    2.1275e+05    2002-03-18 23:23    {'severe storm'   }
    2003-06-18 02:49    {'West'     }         0             0    2003-06-18 10:54    {'attack'         }
    2004-06-20 14:39    {'West'     }    231.29           NaN    2004-06-20 19:16    {'equipment fault'}
    2002-06-06 19:28    {'West'     }    311.86           NaN    2002-06-07 00:51    {'equipment fault'}

When you do not also specify 'RowTimes', tabularTextDatastore uses the first datetime or duration variable as the row times. In this case, the OutageTime variable is used for the row times.

Specify the 'RowTimes' option to use the restoration times (RestorationTime variable) as the row times, instead of the time of the power outages.

ttds = tabularTextDatastore('outages.csv','OutputType','timetable','RowTimes','RestorationTime');
preview(ttds)

ans=8×5 timetable
    RestorationTime        Region           OutageTime        Loss     Customers            Cause       
    ________________    _____________    ________________    ______    __________    ___________________

    2002-02-07 16:50    {'SouthWest'}    2002-02-01 12:18    458.98    1.8202e+06    {'winter storm'   }
    NaT                 {'SouthEast'}    2003-01-23 00:49    530.14    2.1204e+05    {'winter storm'   }
    2003-02-17 08:14    {'SouthEast'}    2003-02-07 21:15     289.4    1.4294e+05    {'winter storm'   }
    2004-04-06 06:10    {'West'     }    2004-04-06 05:44    434.81    3.4037e+05    {'equipment fault'}
    2002-03-18 23:23    {'MidWest'  }    2002-03-16 06:18    186.44    2.1275e+05    {'severe storm'   }
    2003-06-18 10:54    {'West'     }    2003-06-18 02:49         0             0    {'attack'         }
    2004-06-20 19:16    {'West'     }    2004-06-20 14:39    231.29           NaN    {'equipment fault'}
    2002-06-07 00:51    {'West'     }    2002-06-06 19:28    311.86           NaN    {'equipment fault'}

Limitations

Datetime data containing day, month, or time zone names in a language foreign to the en_US locale are not supported. For unrecognized datetime formats, specify the format using the TextscanFormats parameter.

Documentation

tabularTextDatastore

Description

Creation

Syntax

Description

Input Arguments

location — Files or folders included in datastore path | DsFileSet object

'FileExtensions' — Text file extensions character vector | cell array of character vectors | string scalar | string array

'IncludeSubfolders' — Subfolder inclusion flag true or false | 0 or 1

'OutputType' — Output datatype 'auto' (default) | 'table' | 'timetable'

'AlternateFileSystemRoots' — Alternate file system root paths string vector | cell array

'TextType' — Output data type of text variables 'char' (default) | 'string'

'DatetimeType' — Type for imported date and time data 'datetime' (default) | 'text'

'DurationType' — Output data type of duration data 'duration' (default) | 'text'

'DatetimeLocale' — Locale to interpret dates 'en_US' (default) | character vector | string scalar

Properties

File Properties

Files — Files included in datastore cell array of character vectors | string array

FileEncoding — File encoding 'UTF-8' (default) | 'US-ASCII' | 'Macintosh' | ...

ReadVariableNames — Read variable names true | false

VariableNamingRule — Flag to preserve variable names 'modify' (default) | 'preserve'

VariableNames — Names of variables cell array of character vectors | string array

Text Format Properties

NumHeaderLines — Number of lines to skip non-negative integer

Delimiter — Field delimiter characters character vector | cell array of character vectors | string scalar | string array

RowDelimiter — Row delimiter character \r\n (default) | character vector | string scalar

TreatAsMissing — Text to treat as missing values '' (default) | character vector | cell array of character vectors | string scalar | string array

MissingValue — Value for missing numeric fields NaN (default) | scalar

Advanced Text Format Properties

TextscanFormats — Data field format cell array of character vectors | string array

ExponentCharacters — Exponent characters 'eEdD' (default) | character vector | string scalar

CommentStyle — Comment character '' (default) | character vector | string scalar | two-element array

Whitespace — White-space characters ' \b\t' (default) | character vector | string scalar

MultipleDelimitersAsOne — Multiple delimiter handling 0 (false) (default) | 1 (true)

Properties for preview, read, readall Table

SelectedVariableNames — Variables to read cell array of character vectors | string array

SelectedFormats — Formats of selected variables cell array of character vectors | string array

ReadSize — Amount of data to read 20000 (default) | positive scalar | 'file'

TextType — Output data type of text variables 'char' (default) | 'string'

RowTimes — Name of row times variable variable name | variable index

Properties for use by writeall

Folders — Folders used to construct datastore cell array of character vectors

SupportedOutputFormats — List of formats supported for writing row vector

DefaultOutputFormat — Default output format string scalar

Object Functions

Examples

Select Variables to Read

Specify Format of Data to Read

Return Timetable from Tabular Text Datastore

Limitations

See Also

Topics

MATLAB Documentation

Support

`location` — Files or folders included in datastore
path | `DsFileSet` object

`'FileExtensions'` — Text file extensions
character vector | cell array of character vectors | string scalar | string array

`'IncludeSubfolders'` — Subfolder inclusion flag
`true` or `false` | 0 or 1

`'OutputType'` — Output datatype
`'auto'` (default) | `'table'` | `'timetable'`

`'AlternateFileSystemRoots'` — Alternate file system root paths
string vector | cell array

`'TextType'` — Output data type of text variables
`'char'` (default) | `'string'`

`'DatetimeType'` — Type for imported date and time data
`'datetime'` (default) | `'text'`

`'DurationType'` — Output data type of duration data
`'duration'` (default) | `'text'`

`'DatetimeLocale'` — Locale to interpret dates
`'en_US'` (default) | character vector | string scalar

`Files` — Files included in datastore
cell array of character vectors | string array

`FileEncoding` — File encoding
`'UTF-8'` (default) | `'US-ASCII'` | `'Macintosh'` | ...

`ReadVariableNames` — Read variable names
`true` | `false`

`VariableNamingRule` — Flag to preserve variable names
`'modify'` (default) | `'preserve'`

`VariableNames` — Names of variables
cell array of character vectors | string array

`NumHeaderLines` — Number of lines to skip
non-negative integer

`Delimiter` — Field delimiter characters
character vector | cell array of character vectors | string scalar | string array

`RowDelimiter` — Row delimiter character
`\r\n` (default) | character vector | string scalar

`TreatAsMissing` — Text to treat as missing values
`''` (default) | character vector | cell array of character vectors | string scalar | string array

`MissingValue` — Value for missing numeric fields
`NaN` (default) | scalar

`TextscanFormats` — Data field format
cell array of character vectors | string array

`ExponentCharacters` — Exponent characters
`'eEdD'` (default) | character vector | string scalar

`CommentStyle` — Comment character
`''` (default) | character vector | string scalar | two-element array

`Whitespace` — White-space characters
`' \b\t'` (default) | character vector | string scalar

`MultipleDelimitersAsOne` — Multiple delimiter handling
`0 (false)` (default) | `1 (true)`

Properties for `preview`, `read`, `readall` Table

`SelectedVariableNames` — Variables to read
cell array of character vectors | string array

`SelectedFormats` — Formats of selected variables
cell array of character vectors | string array

`ReadSize` — Amount of data to read
20000 (default) | positive scalar | `'file'`

`TextType` — Output data type of text variables
`'char'` (default) | `'string'`

`RowTimes` — Name of row times variable
variable name | variable index

Properties for use by `writeall`

`Folders` — Folders used to construct datastore
cell array of character vectors

`SupportedOutputFormats` — List of formats supported for writing
row vector

`DefaultOutputFormat` — Default output format
string scalar