ecdf

Empirical cumulative distribution function

Description

example

[f,x] = ecdf(y) returns the empirical cumulative distribution function (cdf), f, evaluated at the points in x, using the data in the vector y.

In survival and reliability analysis, this empirical cdf is called the Kaplan-Meier estimate. And the data might correspond to survival or failure times.

example

[f,x] = ecdf(y,Name,Value) returns the empirical function values, f, evaluated at the points in x, with additional options specified by one or more Name,Value pair arguments.

For example, you can specify the type of function to evaluate or which data is censored.

example

[f,x,flo,fup] = ecdf(___) also returns the 95% lower and upper confidence bounds for the evaluated function values. You can use any of the input arguments in the previous syntaxes.

ecdf computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.

example

ecdf(___) draws a stairstep graph of the evaluated function by using the stairs function. Specify 'Bounds','on' to include the confidence bounds in the graph.

ecdf(ax,___) plots on the axes specified by ax instead of the current axes (gca).

Examples

collapse all

Compute the Kaplan-Meier estimate of the cumulative distribution function (cdf) for simulated survival data.

Generate survival data from a Weibull distribution with parameters 3 and 1.

rng('default')  % for reproducibility
failuretime = random('wbl',3,1,15,1);

Compute the Kaplan-Meier estimate of the cdf for survival data.

[f,x] = ecdf(failuretime);
[f,x]
ans = 16×2

         0    0.0895
    0.0667    0.0895
    0.1333    0.1072
    0.2000    0.1303
    0.2667    0.1313
    0.3333    0.2718
    0.4000    0.2968
    0.4667    0.6147
    0.5333    0.6684
    0.6000    1.3749
      ⋮

Plot the estimated cdf.

ecdf(failuretime)

Compute and plot the hazard function of simulated right-censored survival data.

Generate failure times from a Birnbaum-Saunders distribution.

rng('default')  % For reproducibility
failuretime = random('birnbaumsaunders',0.3,1,100,1);

Assuming that the end of the study is at time 0.9, generate a logical array that indicates simulated failure times that are larger than 0.9 as censored data, and store this information in a vector.

T = 0.9;
cens = (failuretime>T);

Plot the empirical hazard function for the data.

ecdf(failuretime,'Function','cumulative hazard', ...
    'Censoring',cens,'Bounds','on');

Generate right-censored survival data and compare the empirical cumulative distribution function (cdf) with the known cdf.

Generate failure times from an exponential distribution with mean failure time of 15.

rng('default')  % For reproducibility
y = exprnd(15,75,1);

Generate drop-out times from an exponential distribution with mean failure time of 30.

d = exprnd(30,75,1);

Generate the observed failure times. They are the minimum of the generated failure times and the drop-out times.

t = min(y,d);

Create a logical array that indicates generated failure times that are larger than the drop-out times. The data for which this is true are censored.

censored = (y>d);

Compute the empirical cdf and confidence bounds.

[f,x,flo,fup] = ecdf(t,'Censoring',censored);

Plot the cdf and confidence bounds.

figure()
ecdf(t,'Censoring',censored,'Bounds','on');
hold on

Superimpose a plot of the known population cdf.

xx = 0:.1:max(t);
yy = 1-exp(-xx/15);
plot(xx,yy,'g-','LineWidth',2)
axis([0 50 0 1])
legend('Empirical','LCB','UCB','Population', ...
    'Location','southeast')
hold off

Generate survival data and plot the empirical survivor function with 99% confidence bounds.

Generate lifetime data from a Weibull distribution with parameters 100 and 2.

rng('default')  % For reproducibility
R = wblrnd(100,2,100,1);

Plot the survivor function for the data with 99% confidence bounds.

ecdf(R,'Function','survivor','Alpha',0.01,'Bounds','on')
hold on

Fit the Weibull survivor function.

x = 1:1:250;
wblsurv = 1-cdf('weibull',x,100,2);
plot(x,wblsurv,'g-','LineWidth',2)
legend('Empirical','LCB','UCB','Population', ...
    'Location','northeast')

The survivor function based on the actual distribution is within the confidence bounds.

Input Arguments

collapse all

Input data, specified as a vector. For example, in survival or reliability analysis, data might be survival or failure times for each item or individual.

ecdf ignores NaN values in y. Additionally, any NaN values in the censoring vector ('Censoring') or frequency vector ('Frequency') cause ecdf to ignore the corresponding values in y.

Data Types: single | double

Axes handle for the figure ecdf plots to, specified as a handle.

For instance, if h is a handle for a figure, then ecdf can plot to that figure as follows.

Example: ecdf(h,x)

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Censoring',c,'Function','cumulative hazard','Alpha',0.025,'Bounds','on' specifies that ecdf returns the cumulative hazard function and plots the 97.5% confidence bounds, accounting for the censored data specified by vector c.

Indicator of censored data, specified as the comma-separated pair including 'Censoring' and a Boolean array of the same size as x. Enter 1 for observations that are right-censored and 0 for observations that are fully observed. Default is all observations are fully observed.

ecdf ignores any NaN values in this censoring vector. Additionally, any NaN values in y or the frequency vector ('Frequency') cause ecdf to ignore the corresponding values in the censoring vector.

Example: If vector cdata stores the censored data information, enter 'Censoring',cdata.

Data Types: logical

Frequency of observations, specified as the comma-separated pair consisting of 'Frequency' and a vector containing nonnegative integer counts. This vector is the same size as the vector x. The jth element of this vector gives the number of times the jth element of x was observed. Default is one observation per element of x.

ecdf ignores any NaN values in this frequency vector. Additionally, any NaN values in y or the censoring vector ('Censoring') cause ecdf to ignore the corresponding values in the frequency vector.

Example: If failurefreq is a vector of frequencies, enter 'Frequency',failurefreq

Data Types: single | double

Significance level for the confidence interval of the evaluated function, specified as the comma-separated pair consisting of 'Alpha' and a scalar value between in the range (0,1). Default is 0.05 for 95% confidence. For a given value alpha, the confidence level is 100(1-alpha)%.

For instance, for a 99% confidence interval, you can specify the alpha value as follows.

Example: 'Alpha',0.01

Data Types: single | double

Type of function that ecdf evaluates and returns, specified as the comma-separated pair consisting of 'Function' and one of the following.

'cdf'Default. Cumulative distribution function.
'survivor'Survivor function.
'cumulative hazard'Cumulative hazard function.

Example: 'Function','cumulative hazard'

Indicator for including bounds, specified as the comma-separated pair consisting of 'Bounds' and one of the following.

'off'Default. Specify to omit bounds.
'on' Specify to include bounds.

Note

This name-value argument is used only for plotting.

Example: 'Bounds','on'

Output Arguments

collapse all

Function values evaluated at the points in x, returned as a column vector.

Sorted observed points in the data vector y, returned as a column vector.

ecdf sorts y, removes duplicate values in the sorted y, and saves the results to the output x. The output x includes the minimum value of y as its first two values. These two values are useful for plotting the outputs of ecdf using the stairs function.

Lower confidence bound for the evaluated function, returned as a column vector. ecdf computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.

Upper confidence bound for the evaluated function, returned as a column vector. ecdf computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.

More About

collapse all

Greenwood’s Formula

Approximation for the variance of Kaplan-Meier estimator.

The variance estimate is given by

V(S(t))=S2(t)ti<Tdiri(ridi),

where ri is the number at risk at time ti, and di is the number of failures at time ti.

References

[1] Cox, D. R., and D. Oakes. Analysis of Survival Data. London: Chapman & Hall, 1984.

[2] Lawless, J. F. Statistical Models and Methods for Lifetime Data. 2nd ed., Hoboken, NJ: John Wiley & Sons, Inc., 2003.

Extended Capabilities

Introduced before R2006a