normalize

Normalize data

Description

example

N = normalize(A) returns the vectorwise z-score of the data in A with center 0 and standard deviation 1.

  • If A is a vector, then normalize operates on the entire vector.

  • If A is a matrix, table, or timetable, then normalize operates on each column of data separately.

  • If A is a multidimensional array, then normalize operates along the first array dimension whose size does not equal 1.

example

N = normalize(A,dim) returns the z-score along dimension dim. For example, normalize(A,2) normalizes each row.

example

N = normalize(___,method) specifies a normalization method for either of the previous syntaxes. For example, normalize(A,'norm') normalizes the data in A by the Euclidean norm (2-norm).

example

N = normalize(___,method,methodtype) specifies the type of normalization for the given method. For example, normalize(A,'norm',Inf) normalizes the data in A using the infinity norm.

example

N = normalize(___,'DataVariables',datavars) specifies variables to operate on when the input data is in a table or timetable.

Examples

collapse all

Normalize data in a vector and matrix by computing the z-score.

Create a vector v and compute the z-score, normalizing the data to have mean 0 and standard deviation 1.

v = 1:5;
N = normalize(v)
N = 1×5

   -1.2649   -0.6325         0    0.6325    1.2649

Create a matrix B and compute the z-score for each column. Then, normalize each row.

B = magic(3)
B = 3×3

     8     1     6
     3     5     7
     4     9     2

N1 = normalize(B)
N1 = 3×3

    1.1339   -1.0000    0.3780
   -0.7559         0    0.7559
   -0.3780    1.0000   -1.1339

N2 = normalize(B,2)
N2 = 3×3

    0.8321   -1.1094    0.2774
   -1.0000         0    1.0000
   -0.2774    1.1094   -0.8321

Scale a vector A by its standard deviation.

A = 1:5;
Ns = normalize(A,'scale')
Ns = 1×5

    0.6325    1.2649    1.8974    2.5298    3.1623

Scale A so that its range is in the interval [0,1].

Nr = normalize(A,'range')
Nr = 1×5

         0    0.2500    0.5000    0.7500    1.0000

Create a vector A and normalize it by its 1-norm.

A = 1:5;
Np = normalize(A,'norm',1)
Np = 1×5

    0.0667    0.1333    0.2000    0.2667    0.3333

Center the data in A so that it has mean 0.

Nc = normalize(A,'center','mean')
Nc = 1×5

    -2    -1     0     1     2

Create a table containing height information for five people.

LastName = {'Sanchez';'Johnson';'Lee';'Diaz';'Brown'};
Height = [71;69;64;67;64];
T = table(LastName,Height)
T=5×2 table
    LastName     Height
    _________    ______

    'Sanchez'      71  
    'Johnson'      69  
    'Lee'          64  
    'Diaz'         67  
    'Brown'        64  

Normalize the height data by the maximum height.

N = normalize(T,'norm',Inf,'DataVariables','Height')
N=5×2 table
    LastName     Height 
    _________    _______

    'Sanchez'          1
    'Johnson'    0.97183
    'Lee'        0.90141
    'Diaz'       0.94366
    'Brown'      0.90141

Input Arguments

collapse all

Input data, specified as a scalar, vector, matrix, multidimensional array, table, or timetable.

If A is a numeric array and has type single, then the output also has type single. Otherwise, the output has type double.

normalize ignores NaN values in A.

Data Types: double | single | table | timetable
Complex Number Support: Yes

Dimension to operate along, specified as a positive integer scalar.

Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Normalization method, specified as one of the following options:

Method

Description

'zscore'

z-score with mean 0 and standard deviation 1

'norm'

2-norm

'scale'

Scale by standard deviation

'range'

Scale range of data to [0,1]

'center'

Center data to have mean 0

'medianiqr'

Center and scale data to have median 0 and interquartile range 1

Method type, specified as a scalar, a 2-element row vector, or a type name, depending on the specified method:

Method

Method Type Options

Description

'zscore'

'std' (default)

Center and scale to have mean 0 and standard deviation 1

'robust'

Center and scale to have median 0 and median absolute deviation 1

'norm'

Positive numeric scalar (default is 2)

p-norm

Inf

Infinity norm

'scale'

'std' (default)

Scale by standard deviation

'mad'

Scale by median absolute deviation

'first'

Scale by first element of data

'iqr'

Scale data by interquartile range

Numeric scalar

Scale data by numeric value

'range'

2-element row vector (default is [0 1])

Interval of the form [a b] where a < b

'center'

'mean'

Center to have mean 0

'median'

Center to have median 0

Numeric scalar

Shift center by numeric value

Table variables, specified as the comma-separated pair consisting of 'DataVariables' and a scalar, vector, cell array, function handle, or table vartype subscript. The 'DataVariables' value indicates which variables of the input table to operate on, and can be one of the following:

  • A character vector or scalar string specifying a single table variable name

  • A cell array of character vectors or string array where each element is a table variable name

  • A vector of table variable indices

  • A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

  • A function handle that takes the table as input and returns a logical scalar

  • A table vartype subscript

Example: 'Age'

Example: {'Height','Weight'}

Example: @isnumeric

Example: vartype('numeric')

More About

collapse all

Z-Score

For a random variable X with mean μ and standard deviation σ, the z-score of a value x is z=(xμ)σ. For sample data with mean X¯ and standard deviation S, the z-score of a data point x is z=(xX¯)S.

z-scores measure the distance of a data point from the mean in terms of the standard deviation. The standardized data set has mean 0 and standard deviation 1, and retains the shape properties of the original data set (same skewness and kurtosis).

P-Norm

The general definition for the p-norm of a vector v that has N elements is

vp=[k=1N|vk|p]1/p,

where p is any positive real value, Inf, or -Inf. Some common values of p are:

  • If p is 1, then the resulting 1-norm is the sum of the absolute values of the vector elements.

  • If p is 2, then the resulting 2-norm gives the vector magnitude or Euclidean length of the vector.

  • If p is Inf, then v=maxi(|v(i)|).

Interquartile Range

The interquartile range (IQR) of a data set describes the range of the middle 50% of values when the values are sorted. If the median of the data is Q2, the median of the lower half of the data is Q1, and the median of the upper half of the data is Q3, then IQR = Q3 - Q1.

The IQR is generally preferred over looking at the full range of the data when the data contains outliers (very large or very small values) because the IQR excludes the largest 25% and smallest 25% of values in the data.

Median Absolute Deviation

The median absolute deviation (MAD) of a data set is the median value of the absolute deviations from the median X˜ of the data: MAD=median(|xX˜|). Therefore, the MAD describes the variability of the data in relation to the median.

The MAD is generally preferred over using the standard deviation of the data when the data contains outliers (very large or very small values) because the standard deviation squares deviations from the mean, giving outliers an unduly large impact. Conversely, the deviations of a small number of outliers do not affect the value of the MAD.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

See Also

| |

Introduced in R2018a