Get information about Parquet file
ParquetInfo
objects contain information about a Parquet file,
such as: file size, variable names and types, encoding, and compression schemes. To get
information about a Parquet file, create the ParquetInfo
object using the
parquetinfo
function.
filename
— Name of Parquet fileName of Parquet file, specified as a character vector or string scalar.
ParquetInfo
works with Parquet 1.0 or Parquet 2.0 files.
Depending on the location of the file, filename
can take on one of these forms.
Location | Form | ||||||||
---|---|---|---|---|---|---|---|---|---|
Current folder or folder on the MATLAB® path | Specify the name of the file in Example: | ||||||||
File in a folder | If the file is not in the current folder or in a folder on the MATLAB path, then specify the full or relative path name. Example:
Example:
| ||||||||
Remote Location | If the file is stored at a remote location, then
Based on your remote location,
For more information, see Work with Remote Data. Example:
|
Data Types: char
| string
Filename
— Absolute path to Parquet fileThis property is read-only.
Absolute path to Parquet file, specified as a string scalar.
Data Types: string
FileSize
— File size in bytesThis property is read-only.
File size in bytes, specified as double
.
Data Types: double
NumRowGroups
— Number of row groupsThis property is read-only.
Number of row groups, specified as a double
.
Data Types: double
RowGroupHeights
— Number of rows in each row groupThis property is read-only.
Number of rows in each row group, specified as a double
.
Data Types: double
VariableNames
— Variable namesThis property is read-only.
Variable names, specified as a string array. If the Parquet file contains
N
variables, then VariableNames
is an array of
size 1
-by-N
containing the names of the
variables.
Data Types: string
VariableTypes
— Variable data typesThis property is read-only.
Variable data types, specified as a string array. If the Parquet file contains
N
variables, then VariableTypes
is an array of
size 1
-by-N
containing datatype names for each
variable. Each element in the array is the name of the MATLAB datatype to which the corresponding variable in the Parquet file
maps.
Data Types: string
VariableCompression
— Variable compression algorithmThis property is read-only.
Variable compression algorithm, specified as a string array. If the Parquet file
contains N
variables, then VariableCompression
is
an array of size 1
-by-N
containing compression
algorithm names. Each element in the array corresponds to the compression algorithm used
to compress that variable in the Parquet file. See parquetwrite
for a list of
supported compression algorithms.
Data Types: string
VariableEncoding
— Variable encodingThis property is read-only.
Variable encoding, specified as a string array. If the Parquet file contains
N
variables, then VariableEncoding
is an array
of size 1
-by-N
containing encoding scheme names.
Each element in the array corresponds to the encoding scheme used to encode that
variable in the Parquet file. See parquetwrite
for a list of
supported encodings.
Data Types: string
Version
— Parquet version"1.0"
| "2.0"
This property is read-only.
Parquet version, specified as either "1.0"
or
"2.0"
.
Data Types: string
Use the praquetinfo
function to create a ParquetInfo
object containing information about the file.
info = parquetinfo('outages.parquet')
info = ParquetInfo with properties: Filename: "/mathworks/devel/bat/BR2020bd/build/matlab/toolbox/matlab/demos/outages.parquet" FileSize: 44202 NumRowGroups: 1 RowGroupHeights: 1468 VariableNames: [1x6 string] VariableTypes: [1x6 string] VariableCompression: [1x6 string] VariableEncoding: [1x6 string] Version: "2.0"
Display the name, type, and compression scheme for the third variable in the file.
disp([info.VariableNames(3) info.VariableTypes(3) info.VariableCompression(3)])
"Loss" "double" "snappy"
You have a modified version of this example. Do you want to open this example with your edits?