1 Introduction

1.1 Introduction to Lifetime Data Analysis

Survival analysis is a branch of statistics concerned with the analysis of time-to-event data, also called lifetime data. In many studies (especially biomedical studies), the main outcome of interest is not simply whether an event occurs, but when it occurs. Examples include the time from a cancer diagnosis to death, the time from surgery to disease recurrence, the time until a machine breaks down, or the time until a customer stops using a service. Because the outcome is a time measurement, survival analysis requires methods that are different from those used for ordinary continuous or binary outcomes.

The main goal of survival analysis is to describe and compare the distribution of event times. Depending on the scientific question, we may be interested in several related objectives.

First, we may want to summarize the typical pattern of event times in a population, such as the probability that a subject survives beyond a given time point. (Descriptive statistics)
Second, we may want to compare the survival experiences of two or more groups, for example, patients receiving different treatments. (Testing)
Third, we may want to study how explanatory variables, or covariates, are associated with the timing of the event. (Regression)

To make these ideas concrete, suppose that for each individual we define a nonnegative random variable \[T = \text{time from a well-defined starting point to the occurrence of an event}.\] The starting point could be diagnosis, study enrollment, beginning of treatment, installation of a machine, or the first purchase by a customer. The event could be death, relapse, failure, or cancellation of a subscription. The variable \(T\) is called the survival time, failure time, or event time, depending on the context.

In practice, a survival dataset often contains the following pieces of information for each subject:

the observed follow-up time,
an event indicator showing whether the event was observed,
a collection of covariates.

Let \[Y = \text{observed follow-up time}, \qquad \Delta = \begin{cases} 1, & \text{if the event is observed},\\ 0, & \text{if the event is not observed during follow-up}. \end{cases}\] Then a typical observation can be written as \[(Y,\Delta,X),\] where \(X\) represents a vector of covariates such as age, sex, treatment assignment, biomarker values, or other relevant subject characteristics.

The event indicator \(\Delta\) is needed because, in many studies, the true event time is not observed for every subject. For example, a patient may still be alive when the study ends, or a machine may still be functioning at the end of the observation period. In such cases, we only know that the event time exceeds the observed follow-up time. This feature, called censoring, is one of the main reasons why survival analysis requires special methods. We will discuss censoring and related concepts more carefully later in this chapter.

1.1.1 What Makes Lifetime Data Special?

Lifetime data have several features that distinguish them from other types of data.

First, the response variable is a time, so it is always nonnegative: \[T \geq 0.\] This naturally leads to probability models supported on \([0,\infty)\).

Second, the outcome is often incompletely observed because of censoring. Standard statistical methods that ignore censoring can lead to biased conclusions.

Third, in many applications the distribution of \(T\) is asymmetric. For example, event times are often right-skewed: many individuals experience the event relatively early, while a smaller number survive for a long time.

Fourth, the scientific interpretation is often dynamic. Instead of asking only whether an event occurs, we may ask questions such as:

What is the probability that an individual survives beyond time \(t\)?
Among individuals who have survived up to time \(t\), how likely is the event to occur soon after \(t\)?
How does treatment affect the timing of the event?

These questions motivate the key concepts of the survival function and hazard function, which will be introduced in the next section.

1.1.2 Examples of Lifetime Data

Survival analysis arose originally in medical research, but it is now widely used in many fields. Below are some common examples. Examples Example 1.1, Example 1.2, Example 1.3, and Example 1.4 illustrate this range of applications.

Example 1.1 (Time to death) In a clinical study, researchers may record the time from diagnosis of a disease to death. Here the event is death, and the time scale may be measured in days, months, or years. Covariates could include treatment group, age, disease stage, and other clinical variables.

Example 1.2 (Time to disease relapse) For patients who achieve remission after treatment, researchers may study the time from remission to relapse. In this setting, the event is recurrence of disease. Some patients may not relapse before the study ends, so their times are right-censored.

Example 1.3 (Time to machine failure) In engineering, one may study the lifetime of a mechanical component. The event is system failure, and the goal may be to understand product reliability or compare materials under different operating conditions.

Example 1.4 (Time to customer churn) In business and economics, companies may study the time from a customer’s first subscription to cancellation of service. The event is customer churn. Covariates may include age, region, subscription plan, and previous usage patterns.

Although these examples come from different areas, they share a common statistical structure: each subject is followed over time until an event occurs or observation ends.

1.1.3 A Simple Data Example

A small hypothetical dataset might look like Table Table 1.1.

Table 1.1: A simple example of survival data.

Subject	Follow-up time \(Y\)	Event indicator \(\Delta\)	Age	Treatment	Sex
1	5.2	1	63	A	F
2	8.7	0	57	B	M
3	3.1	1	71	A	F
4	10.0	0	49	B	F
5	6.4	1	68	A	M

For Subject 1, the event was observed at time \(5.2\), so \(\Delta=1\). For Subject 2, the event was not observed during the follow-up period of length \(8.7\), so \(\Delta=0\). Thus, we only know that the true event time for Subject 2 is greater than \(8.7\).

1.1.4 Why Survival Analysis Matters

If we ignore the timing of events and only record whether the event occurred, we lose important information. For example, two treatments might lead to the same proportion of deaths by the end of a study, but one treatment may substantially delay death relative to the other. Survival analysis allows us to use the full information contained in event times, while properly handling incomplete observations.

For this reason, survival analysis plays a central role in medicine, public health, engineering, economics, sociology, and many other disciplines. In the sections that follow, we will introduce the basic probabilistic quantities used to describe lifetime data and develop the tools needed to analyze them.

1.2 Basic Probability Concepts for Survival Analysis

In survival analysis, the main object of interest is a nonnegative random variable \[T = \text{the time from a well-defined starting point to the event of interest}.\] To describe the distribution of \(T\), we use several closely related probability functions. In this section, we introduce the distribution function, density function, survival function, hazard function, and cumulative hazard function, and explain how they are connected.

Throughout this section, we assume that \(T\) is a continuous random variable taking values in \([0,\infty)\).

1.2.1 Distribution Function

The distribution function, or cumulative distribution function (CDF), of \(T\) is defined by \[F(t) = P(T \le t), \qquad t \ge 0.\] This gives the probability that the event occurs by time \(t\).

For example, if \[F(5)=0.30,\] then there is a \(30\%\) chance that the event has occurred by time \(5\).

The function \(F(t)\) has the following properties:

\(0 \le F(t) \le 1\) for all \(t\),
\(F(t)\) is nondecreasing in \(t\),
\(F(0) = P(T \le 0)\),
\(\lim_{t\to\infty} F(t)=1\) if every subject eventually experiences the event.

1.2.2 Density Function

If \(T\) is continuous, its density function is denoted by \(f(t)\) and satisfies \[F(t)=\int_0^t f(u)\,du.\] Equivalently, when \(F\) is differentiable, \[f(t)=F'(t).\]

The density function describes how the event times are distributed over time. Roughly speaking, \(f(t)\) tells us how concentrated the event times are near the value \(t\).

For a small number \(h>0\), \[P(t \le T < t+h) \approx f(t)h.\] Thus, \(f(t)\) can be interpreted as the approximate probability per unit time of failing near time \(t\).

The density function has the following properties:

\(f(t)\ge 0\) for all \(t\),
\(\int_0^\infty f(t)\,dt = 1\).

1.2.3 Survival Function

A central quantity in survival analysis is the survival function, defined by \[S(t)=P(T>t), \qquad t\ge 0.\] This is the probability that the subject survives beyond time \(t\), or equivalently, that the event has not occurred by time \(t\).

For example, if \[S(5)=0.70,\] then there is a \(70\%\) chance that the event time exceeds \(5\).

The survival function is directly related to the distribution function: \[S(t)=1-F(t).\] Indeed, \[P(T>t)=1-P(T\le t).\]

The survival function has the following properties:

\(0 \le S(t)\le 1\) for all \(t\),
\(S(0)=P(T>0)\),
\(S(t)\) is nonincreasing in \(t\),
\(\lim_{t\to\infty} S(t)=0\) if every subject eventually experiences the event.

If \(T\) has density \(f\), then \[S(t)=\int_t^\infty f(u)\,du.\]

1.2.4 Hazard Function

The hazard function is one of the most important concepts in survival analysis. It describes the instantaneous risk of experiencing the event at time \(t\), among those who have survived up to time \(t\).

Formally, the hazard function is defined by \[\lambda(t)=\lim_{h\to 0^+}\frac{P(t \le T < t+h \mid T\ge t)}{h},\] provided the limit exists.

This definition can be understood as follows:

first condition on the subject having survived up to time \(t\),
then look at the probability that the event occurs in the short interval \([t,t+h)\),
divide by the interval length \(h\),
then let \(h\) go to \(0\).

Thus, \(\lambda(t)\) is an instantaneous event rate, not a probability. Since it is a rate, it can be larger than \(1\).

1.2.5 Relationship Between Hazard, Density, and Survival

We now derive an important formula connecting the hazard function to the density and survival functions.

Starting from the definition, \[P(t \le T < t+h \mid T\ge t) = \frac{P(t \le T < t+h)}{P(T\ge t)}.\] For a continuous random variable, \[P(t \le T < t+h)\approx f(t)h, \qquad P(T\ge t)=S(t).\] Therefore, \[\lambda(t) = \frac{f(t)}{S(t)},\] whenever \(S(t)>0\).

This is one of the basic identities in survival analysis: \[\boxed{\lambda(t)=\frac{f(t)}{S(t)}}.\]

Using \(S(t)=1-F(t)\), we can also write \[\lambda(t)=\frac{f(t)}{1-F(t)}.\]

Intuitively:

\(f(t)\) measures how likely the event is to occur near time \(t\) in the whole population,
\(S(t)\) measures how many individuals are still event-free at time \(t\),
\(\lambda(t)\) measures the instantaneous risk at time \(t\) among those still event-free.

1.2.6 Cumulative Hazard Function

The cumulative hazard function is defined by integrating the hazard function: \[\Lambda(t)=\int_0^t \lambda(u)\,du.\] This quantity accumulates the hazard over time.

Although \(\Lambda(t)\) does not have as direct a probability interpretation as \(F(t)\) or \(S(t)\), it is extremely useful mathematically and plays a central role in survival analysis.

Since \(\lambda(t)\ge 0\), the cumulative hazard function is nondecreasing in \(t\).

1.2.7 Relationship Between Survival and Cumulative Hazard

A very important identity is \[S(t)=\exp\{-\Lambda(t)\}.\] We now show where this comes from.

Recall that \[\lambda(t)=\frac{f(t)}{S(t)}.\] Since \[S(t)=1-F(t),\] we have \[S'(t)=-f(t).\] Substituting \(f(t)=-S'(t)\) into the hazard formula gives \[\lambda(t)=\frac{-S'(t)}{S(t)}.\] So \[\frac{S'(t)}{S(t)}=-\lambda(t).\] Integrating both sides from \(0\) to \(t\), \[\int_0^t \frac{S'(u)}{S(u)}\,du = -\int_0^t \lambda(u)\,du.\] The left-hand side is \[\log S(t)-\log S(0).\] If we assume \(S(0)=1\), then \[\log S(t)=-\Lambda(t),\] and therefore \[\boxed{S(t)=\exp\{-\Lambda(t)\}}.\]

This formula implies that once we know the hazard function, we can recover the survival function. Conversely, once we know the survival function, we can recover the cumulative hazard function through \[\Lambda(t)=-\log S(t).\]

1.2.8 Summary of Relationships

The main functions introduced in this section are all connected. Starting from any one of them, we can often derive the others.

\[F(t)=P(T\le t),\] \[S(t)=P(T>t)=1-F(t),\] \[f(t)=F'(t)=-S'(t),\] \[\lambda(t)=\frac{f(t)}{S(t)},\] \[\Lambda(t)=\int_0^t \lambda(u)\,du,\] \[S(t)=\exp\{-\Lambda(t)\}.\]

These relationships are often summarized in the diagram \[F(t) \longleftrightarrow f(t) \longleftrightarrow S(t) \longleftrightarrow \lambda(t) \longleftrightarrow \Lambda(t).\]

1.2.9 An Example: Exponential Survival Time

To make these definitions more concrete, suppose \[T \sim \text{Exponential}(\lambda),\] where \(\lambda>0\).

Then the density function is \[f(t)=\lambda e^{-\lambda t}, \qquad t\ge 0.\] The distribution function is \[F(t)=1-e^{-\lambda t}.\] The survival function is \[S(t)=e^{-\lambda t}.\] The hazard function is \[\lambda(t)=\frac{f(t)}{S(t)} = \frac{\lambda e^{-\lambda t}}{e^{-\lambda t}} = \lambda.\] So the exponential distribution has a constant hazard.

The cumulative hazard function is \[\Lambda(t)=\int_0^t \lambda\,du=\lambda t.\] We can verify that \[S(t)=e^{-\Lambda(t)}=e^{-\lambda t},\] as expected.

Interpretation

It is helpful to keep the following interpretations in mind:

\(F(t)\): probability that the event has occurred by time \(t\),
\(S(t)\): probability that the event has not occurred by time \(t\),
\(f(t)\): relative concentration of event times around \(t\),
\(\lambda(t)\): instantaneous risk of event at time \(t\) among those still at risk,
\(\Lambda(t)\): total accumulated hazard up to time \(t\).

Each of these functions gives a different but related view of the distribution of survival times.

Summary

In survival analysis, several probability functions are used to describe the distribution of a nonnegative event time \(T\). The distribution function \(F(t)\) and density function \(f(t)\) are familiar from basic probability. The survival function \(S(t)\) gives the probability of surviving beyond time \(t\), while the hazard function \(\lambda(t)\) gives the instantaneous event rate at time \(t\) among those still at risk. The cumulative hazard function \(\Lambda(t)\) is the integral of the hazard over time. These quantities are linked through the identities \[S(t)=1-F(t), \qquad \lambda(t)=\frac{f(t)}{S(t)}, \qquad S(t)=e^{-\Lambda(t)}.\] These relationships form the foundation for many of the methods developed later in survival analysis.

1.3 Truncation and Censoring

One of the main features that makes survival analysis different from ordinary data analysis is that the event time is often not fully observed. In many studies, we do not observe the exact lifetime for every subject. Instead, we may only know that the event occurred after a certain time, before a certain time, or within a certain time interval. In other situations, some subjects are not included in the sample unless their event times satisfy certain conditions. These phenomena are called censoring and truncation.

Although censoring and truncation are related, they are not the same. The key distinction is:

Censoring means that a subject is included in the dataset, but the exact event time is only partially observed.
Truncation means that some subjects are not included in the dataset at all, because their event times fall outside an observable range.

In this section, we introduce the most common types of censoring and truncation.

1.3.1 Censoring

Censoring occurs when the exact event time \(T\) is not fully known, but some partial information about \(T\) is available. Importantly, the subject is still part of the study dataset.

1.3.1.1 Right Censoring

Right censoring is the most common type of censoring in survival analysis. It occurs when we know that the event time is greater than some observed time, but we do not know its exact value.

Suppose a subject is followed up to time \(C\). If the event has not occurred by time \(C\), then we only know that \[T > C.\] In this case, the observed data are \[Y = \min(T,C), \qquad \Delta = I(T \le C),\] where \[\Delta = \begin{cases} 1, & \text{if the event is observed},\\ 0, & \text{if the observation is right-censored}. \end{cases}\]

Thus:

if \(\Delta=1\), then \(Y=T\) and the exact event time is observed;
if \(\Delta=0\), then \(Y=C\) and we only know that \(T>C\).

1.3.1.1.1 Example.

In a cancer study, patients are followed for five years after diagnosis. If a patient is still alive when the study ends, then the death time is not observed exactly. We only know that the patient survived longer than five years. This is right censoring.

1.3.1.1.2 Another example.

A machine is tested for 1000 hours. If it is still working at the end of the test, then its failure time is right-censored at 1000 hours.

1.3.1.2 Left Censoring

Left censoring occurs when we know that the event happened before some observed time, but we do not know exactly when.

That is, instead of observing \(T\), we only know that \[T \le C\] for some time \(C\).

1.3.1.2.1 Example.

Suppose patients are tested every year for a certain infection. A patient is negative at the first visit and already positive at the next visit. If the disease process began before the second visit but the exact time is unknown, then the onset time may be left-censored relative to that observation time.

1.3.1.2.2 Another example.

In an industrial setting, one may inspect a component for damage only at a certain time. If the damage is already present at the first inspection, the exact failure time is unknown, but it is known to have occurred before that inspection time.

Left censoring is less common than right censoring in introductory survival analysis, but it appears naturally in some biomedical and reliability studies.

1.3.1.3 Interval Censoring

Interval censoring occurs when the exact event time is not known, but it is known to lie within an interval.

That is, for some times \(L\) and \(R\), we know that \[L < T \le R,\] but we do not know the exact value of \(T\).

1.3.1.3.1 Example.

Suppose patients are tested for disease recurrence every three months. If a patient is disease-free at month 6 but recurrence is detected at month 9, then the recurrence time is not known exactly. We only know that it occurred in the interval \[6 < T \le 9.\] This is interval censoring.

1.3.1.3.2 Another example.

If a household survey is conducted once per year, and a person reports being employed in one year and unemployed in the next, then the time of job loss may only be known to lie between the two interview dates.

Interval censoring is common when subjects are only assessed at discrete visit times rather than continuously monitored.

1.3.2 A Comparison of the Main Types of Censoring

The three common censoring types can be summarized as follows:

Right censoring: we know that \(T > C\).
Left censoring: we know that \(T \le C\).
Interval censoring: we know that \(L < T \le R\).

In all three cases, the subject is still observed and included in the dataset. What is missing is the exact event time.

1.3.3 Why Censoring Matters

Censoring means that we cannot treat the observed follow-up times as if they were ordinary complete outcomes. For example, if we simply ignored all censored observations, we would usually bias the analysis toward subjects with shorter event times.

To see this intuitively, consider a study in which many subjects are still event-free when follow-up ends. If we discard these subjects, then the remaining sample contains disproportionately many early events, making survival appear worse than it really is.

Survival analysis methods are designed to use the available partial information correctly. For example, a right-censored observation still tells us that the subject survived at least up to the censoring time, and this information is valuable.

1.3.4 Truncation

Truncation is different from censoring. Under truncation, some subjects never appear in the observed dataset at all. In other words, the sample is restricted to subjects whose event times satisfy certain conditions.

1.3.4.1 Left Truncation

Left truncation occurs when subjects are only observed if their event times exceed a certain threshold.

Suppose a subject enters observation at time \(L\). Then the subject is included in the study only if \[T > L.\] Subjects with \(T \le L\) are never observed.

1.3.4.1.1 Example.

Suppose we study survival after disease onset, but patients are only recruited into the study when they first visit a clinic. A patient who dies before reaching the clinic is never included in the dataset. Thus, only subjects who survive long enough to enter the study are observed. This is left truncation.

1.3.4.1.2 Another example.

In a study of ages at death, if the data source only includes people who lived to at least age 65, then anyone who died before age 65 is absent from the dataset. The data are left-truncated at age 65.

Left truncation is also called delayed entry in many applications.

1.3.4.2 Right Truncation

Right truncation occurs when subjects are only observed if their event times are less than or equal to some threshold.

That is, a subject is included only if \[T \le U.\] Subjects with \(T > U\) are not observed.

1.3.4.2.1 Example.

Suppose a registry includes only individuals who have already experienced a particular event before the registry closes. Those whose event occurs later are not included. Then the data may be right-truncated.

Right truncation is less common in basic applications than left truncation, but it arises in retrospective studies and ascertainment based on occurrence of the event.

1.3.5 Censoring Versus Truncation

It is important to clearly distinguish censoring from truncation.

1.3.5.0.1 Censoring.

A censored subject is included in the dataset, but the event time is not known exactly.

1.3.5.0.2 Truncation.

A truncated subject is not included in the dataset at all.

This distinction has major statistical consequences. Under censoring, we analyze incomplete observations from observed subjects. Under truncation, the observed sample itself is selective, because some subjects are systematically missing from the study population.

1.3.5.0.3 Illustration.

Suppose we are studying time to death after disease onset.

If a patient is enrolled in the study and is still alive when follow-up ends, then the patient’s survival time is right-censored.
If a patient dies before having a chance to enter the study, then the patient is not observed at all; this is left truncation.

So censoring concerns incomplete information on observed subjects, while truncation concerns exclusion of subjects from the observed sample.

1.3.6 Simple Schematic Summary

It is often helpful to visualize the different possibilities.

1.3.6.0.1 Exact observation.

\[T = 8\] means the event occurred exactly at time 8.

1.3.6.0.2 Right censoring.

\[T > 8\] means the subject was event-free up to time 8, but the later event time is unknown.

1.3.6.0.3 Left censoring.

\[T \le 8\] means the event occurred before or at time 8, but the exact time is unknown.

1.3.6.0.4 Interval censoring.

\[5 < T \le 8\] means the event occurred sometime between times 5 and 8.

1.3.6.0.5 Left truncation.

Only subjects with \[T > 5\] can enter the observed sample. Subjects with \(T \le 5\) are not observed.

1.3.6.0.6 Right truncation.

Only subjects with \[T \le 8\] can enter the observed sample. Subjects with \(T > 8\) are not observed.

Summary

Censoring and truncation are fundamental features of survival data. Censoring means that a subject is observed, but the exact event time is only partially known. The most common type is right censoring, where we only know that the event time exceeds the observed follow-up time. Other forms include left censoring and interval censoring.

Truncation means that some subjects are not observed at all because their event times fall outside an observable range. Left truncation arises when only subjects who survive long enough to enter the study are observed, while right truncation arises when only subjects whose events occur before a certain time are included.

Understanding these ideas is essential, because many of the core methods in survival analysis are designed specifically to handle censoring and truncation properly.

1.4 Common Distributions for Modeling Lifetime Data

In survival analysis, probability models are often used to describe the distribution of a lifetime or event time \(T\). Different distributions lead to different shapes for the density, survival function, and hazard function, and hence may be appropriate for different kinds of data.

In this section, we introduce three commonly used distributions for lifetime data:

the exponential distribution,
the Weibull distribution,
the log-normal distribution.

For each distribution, we present its density function, distribution function, survival function, hazard function, and some basic summary measures such as the mean and variance.

Throughout, we assume that \(T>0\).

1.4.1 The Exponential Distribution

The exponential distribution is one of the simplest and most important models in survival analysis. Its main feature is that it has a constant hazard rate over time.

We say that \[T \sim \text{Exponential}(\lambda),\] where \(\lambda>0\) is the rate parameter, if the density function of \(T\) is \[f(t)=\lambda e^{-\lambda t}, \qquad t\ge 0.\]

1.4.1.1 Basic Functions

The distribution function is \[F(t)=P(T\le t)=1-e^{-\lambda t}, \qquad t\ge 0.\]

The survival function is \[S(t)=P(T>t)=e^{-\lambda t}, \qquad t\ge 0.\]

The hazard function is \[h(t)=\frac{f(t)}{S(t)} = \frac{\lambda e^{-\lambda t}}{e^{-\lambda t}} = \lambda.\]

Thus, the exponential distribution has constant hazard: \[h(t)=\lambda.\]

The cumulative hazard function is \[H(t)=\int_0^t h(u)\,du=\lambda t.\]

1.4.1.2 Mean and Variance

The mean and variance of an exponential random variable are \[E(T)=\frac{1}{\lambda}, \qquad \operatorname{Var}(T)=\frac{1}{\lambda^2}.\]

1.4.1.3 Interpretation

The exponential distribution is appropriate when the event risk does not change over time. For example, if a machine has the same instantaneous failure rate regardless of how long it has been operating, then an exponential model may be reasonable.

In practice, however, many real datasets have hazard rates that increase or decrease over time, so the exponential model is often too restrictive.

1.4.2 The Weibull Distribution

The Weibull distribution is a widely used generalization of the exponential distribution. It is very flexible because its hazard function can be increasing, decreasing, or constant, depending on the parameter values.

We say that \[T \sim \text{Weibull}(\lambda,k),\] where \(\lambda>0\) is a scale parameter and \(k>0\) is a shape parameter, if the density function is \[f(t)=\frac{k}{\lambda}\left(\frac{t}{\lambda}\right)^{k-1} \exp\left\{-\left(\frac{t}{\lambda}\right)^k\right\}, \qquad t\ge 0.\]

1.4.2.1 Basic Functions

The distribution function is \[F(t)=1-\exp\left\{-\left(\frac{t}{\lambda}\right)^k\right\}, \qquad t\ge 0.\]

The survival function is \[S(t)=\exp\left\{-\left(\frac{t}{\lambda}\right)^k\right\}, \qquad t\ge 0.\]

The hazard function is \[h(t)=\frac{f(t)}{S(t)} = \frac{k}{\lambda}\left(\frac{t}{\lambda}\right)^{k-1}, \qquad t\ge 0.\]

The cumulative hazard function is \[H(t)=\left(\frac{t}{\lambda}\right)^k.\]

1.4.2.2 Mean and Variance

The mean of a Weibull random variable is \[E(T)=\lambda \Gamma\left(1+\frac{1}{k}\right),\] where \(\Gamma(\cdot)\) is the gamma function, defined by \[\Gamma(a)=\int_0^\infty x^{a-1}e^{-x}\,dx, \qquad a>0.\]

The variance is \[\operatorname{Var}(T) = \lambda^2 \left[ \Gamma\left(1+\frac{2}{k}\right) - \left\{ \Gamma\left(1+\frac{1}{k}\right) \right\}^2 \right].\]

1.4.2.3 Interpretation of the Shape Parameter

The parameter \(k\) determines the shape of the hazard function:

if \(k=1\), then the Weibull distribution reduces to the exponential distribution, and the hazard is constant;
if \(k>1\), then the hazard increases over time;
if \(0<k<1\), then the hazard decreases over time.

This flexibility makes the Weibull distribution especially useful in reliability studies and biomedical applications.

1.4.3 The Log-Normal Distribution

The log-normal distribution is another important model for lifetime data. It is based on the idea that the logarithm of the event time is normally distributed.

We say that \[T \sim \text{Lognormal}(\mu,\sigma^2),\] where \(\mu\in \mathbb{R}\) and \(\sigma>0\), if \[\log T \sim N(\mu,\sigma^2).\]

This implies that \(T\) itself is always positive, which makes the log-normal distribution suitable for modeling lifetime data.

1.4.3.1 Density Function

The density function of a log-normal random variable is \[f(t)= \frac{1}{t\sigma\sqrt{2\pi}} \exp\left\{ -\frac{(\log t-\mu)^2}{2\sigma^2} \right\}, \qquad t>0.\]

1.4.3.2 Distribution Function

Its distribution function is \[F(t)=P(T\le t) = \Phi\left(\frac{\log t-\mu}{\sigma}\right), \qquad t>0,\] where \(\Phi(\cdot)\) is the cumulative distribution function of the standard normal distribution.

1.4.3.3 Survival Function

The survival function is \[S(t)=P(T>t) = 1-\Phi\left(\frac{\log t-\mu}{\sigma}\right), \qquad t>0.\]

1.4.3.4 Hazard Function

The hazard function is \[h(t)=\frac{f(t)}{S(t)} = \frac{ \frac{1}{t\sigma\sqrt{2\pi}} \exp\left\{ -\frac{(\log t-\mu)^2}{2\sigma^2} \right\} }{ 1-\Phi\left(\frac{\log t-\mu}{\sigma}\right) }, \qquad t>0.\]

Unlike the exponential and Weibull distributions, the hazard function of the log-normal distribution does not usually have a simple monotone form. It often increases at first and then decreases later.

1.4.3.5 Mean and Variance

The mean of a log-normal random variable is \[E(T)=\exp\left(\mu+\frac{\sigma^2}{2}\right).\]

The variance is \[\operatorname{Var}(T) = \left(e^{\sigma^2}-1\right)e^{2\mu+\sigma^2}.\]

1.4.3.6 Interpretation

The log-normal distribution is often useful when the logarithm of lifetime is more symmetric or approximately normal. It can model right-skewed lifetime data and is commonly used when the hazard does not have a simple monotone pattern.

1.4.4 Comparison of the Three Distributions

These three distributions differ mainly in the shapes they allow for the hazard function.

Exponential: constant hazard.
Weibull: increasing, decreasing, or constant hazard depending on the shape parameter.
Log-normal: hazard may rise and then fall, so it is more flexible in some situations.

The exponential model is mathematically simple, but often too restrictive. The Weibull model is more flexible and still relatively easy to work with. The log-normal model can capture more complicated patterns, especially when the data are strongly skewed.

1.5 Review of Maximum Likelihood Estimation

In this section, we briefly review the basic idea of maximum likelihood estimation (MLE), and then apply it to two common lifetime distributions: the exponential distribution and the Weibull distribution.

1.5.1 General Idea of Maximum Likelihood Estimation

Suppose that \(X_1,\dots,X_n\) are independent and identically distributed observations from a distribution with density or probability mass function \[f(x;\theta),\] where \(\theta\) is an unknown parameter.

The likelihood function is defined as the joint density (or joint probability mass function), viewed as a function of the parameter \(\theta\): \[L(\theta)=\prod_{i=1}^n f(X_i;\theta).\] The idea of maximum likelihood estimation is to choose the value of \(\theta\) that makes the observed data most likely.

The maximum likelihood estimator (MLE), denoted by \(\hat\theta\), is the value of \(\theta\) that maximizes \(L(\theta)\): \[\hat\theta=\arg\max_\theta L(\theta).\]

Because products can be cumbersome to work with, we usually maximize the log-likelihood: \[\ell(\theta)=\log L(\theta)=\sum_{i=1}^n \log f(X_i;\theta).\] Since the logarithm is a strictly increasing function, maximizing \(\ell(\theta)\) is equivalent to maximizing \(L(\theta)\).

In many problems, we find the MLE by:

writing down the likelihood function;
taking the logarithm to obtain the log-likelihood;
differentiating with respect to the parameter(s);
solving the likelihood equations.

1.5.2 A Simple Example: Exponential Distribution

Suppose \[T_1,\dots,T_n \overset{\text{i.i.d.}}{\sim} \text{Exponential}(\lambda),\] where \(\lambda>0\) is the unknown rate parameter. The density is \[f(t;\lambda)=\lambda e^{-\lambda t}, \qquad t\ge 0.\]

1.5.2.1 Likelihood Function

The likelihood function is \[L(\lambda)=\prod_{i=1}^n \lambda e^{-\lambda T_i} =\lambda^n \exp\left(-\lambda\sum_{i=1}^n T_i\right).\]

1.5.2.2 Log-Likelihood Function

Taking logs gives \[\ell(\lambda)=\log L(\lambda) = n\log\lambda - \lambda\sum_{i=1}^n T_i.\]

1.5.2.3 Finding the MLE

Differentiate with respect to \(\lambda\): \[\frac{d}{d\lambda}\ell(\lambda) =\frac{n}{\lambda}-\sum_{i=1}^n T_i.\] Setting this equal to zero gives \[\frac{n}{\lambda}-\sum_{i=1}^n T_i=0.\] Solving for \(\lambda\), we obtain \[\hat\lambda=\frac{n}{\sum_{i=1}^n T_i}=\frac{1}{\bar T},\] where \[\bar T=\frac{1}{n}\sum_{i=1}^n T_i\] is the sample mean.

So the MLE of the exponential rate parameter is \[\boxed{\hat\lambda=\frac{1}{\bar T}}.\]

1.5.2.4 Checking That This Is a Maximum

The second derivative is \[\frac{d^2}{d\lambda^2}\ell(\lambda)=-\frac{n}{\lambda^2}<0,\] so the log-likelihood is concave in \(\lambda\), and the critical point is indeed a maximum.

1.5.2.5 Interpretation

Recall that for an exponential random variable, \[E(T)=\frac{1}{\lambda}.\] Thus the MLE can also be written as \[\hat\lambda=\frac{1}{\bar T},\] which is natural because it replaces the population mean \(E(T)\) by the sample mean \(\bar T\).

1.5.3 Weibull Distribution

Now suppose \[T_1,\dots,T_n \overset{\text{i.i.d.}}{\sim} \text{Weibull}(\lambda,k),\] where \(\lambda>0\) is a scale parameter and \(k>0\) is a shape parameter. The density is \[f(t;\lambda,k)=\frac{k}{\lambda}\left(\frac{t}{\lambda}\right)^{k-1} \exp\left\{-\left(\frac{t}{\lambda}\right)^k\right\}, \qquad t\ge 0.\]

We now derive the likelihood and the estimating equations for \(\lambda\) and \(k\).

1.5.3.1 Likelihood Function

The likelihood function is \[L(\lambda,k) =\prod_{i=1}^n \frac{k}{\lambda}\left(\frac{T_i}{\lambda}\right)^{k-1} \exp\left\{-\left(\frac{T_i}{\lambda}\right)^k\right\}.\]

This can be rewritten as \[L(\lambda,k) = \left(\frac{k}{\lambda}\right)^n \prod_{i=1}^n \left(\frac{T_i}{\lambda}\right)^{k-1} \exp\left\{-\sum_{i=1}^n \left(\frac{T_i}{\lambda}\right)^k\right\}.\]

1.5.3.2 Log-Likelihood Function

Taking logs, \[\ell(\lambda,k) = n\log k - n\log \lambda + (k-1)\sum_{i=1}^n \log\left(\frac{T_i}{\lambda}\right) -\sum_{i=1}^n \left(\frac{T_i}{\lambda}\right)^k.\]

Expanding the middle term, \[\sum_{i=1}^n \log\left(\frac{T_i}{\lambda}\right) = \sum_{i=1}^n \log T_i - n\log\lambda,\] so \[\ell(\lambda,k) = n\log k -nk\log\lambda +(k-1)\sum_{i=1}^n \log T_i -\sum_{i=1}^n \left(\frac{T_i}{\lambda}\right)^k.\]

1.5.3.3 Derivative with Respect to \(\lambda\)

Differentiate \(\ell(\lambda,k)\) with respect to \(\lambda\): \[\frac{\partial}{\partial\lambda}\ell(\lambda,k) = -\frac{nk}{\lambda} + \sum_{i=1}^n \frac{\partial}{\partial\lambda} \left[ -\left(\frac{T_i}{\lambda}\right)^k \right].\] Now \[\left(\frac{T_i}{\lambda}\right)^k=T_i^k\lambda^{-k},\] so \[\frac{\partial}{\partial\lambda} \left[ -\left(\frac{T_i}{\lambda}\right)^k \right] = kT_i^k\lambda^{-k-1}.\] Therefore, \[\frac{\partial}{\partial\lambda}\ell(\lambda,k) = -\frac{nk}{\lambda} + k\lambda^{-k-1}\sum_{i=1}^n T_i^k.\] Setting this equal to zero gives \[-\frac{nk}{\lambda} + k\lambda^{-k-1}\sum_{i=1}^n T_i^k=0.\] Multiplying both sides by \(\lambda^{k+1}/k\), \[-n\lambda^k+\sum_{i=1}^n T_i^k=0.\] Hence \[\lambda^k=\frac{1}{n}\sum_{i=1}^n T_i^k,\] so for a fixed value of \(k\), \[\boxed{ \hat\lambda = \left(\frac{1}{n}\sum_{i=1}^n T_i^k\right)^{1/k}. }\]

1.5.3.4 Derivative with Respect to \(k\)

Differentiate \(\ell(\lambda,k)\) with respect to \(k\): \[\frac{\partial}{\partial k}\ell(\lambda,k) = \frac{n}{k} -n\log\lambda +\sum_{i=1}^n \log T_i -\sum_{i=1}^n \left(\frac{T_i}{\lambda}\right)^k \log\left(\frac{T_i}{\lambda}\right).\] Setting this equal to zero gives the likelihood equation for \(k\): \[\frac{n}{k} -n\log\lambda +\sum_{i=1}^n \log T_i -\sum_{i=1}^n \left(\frac{T_i}{\lambda}\right)^k \log\left(\frac{T_i}{\lambda}\right) =0.\]

Unlike the exponential case, this equation usually cannot be solved in closed form. In practice, one typically uses a numerical method such as Newton-Raphson to solve for \(\hat k\), and then substitutes \(\hat k\) into \[\hat\lambda = \left(\frac{1}{n}\sum_{i=1}^n T_i^{\hat k}\right)^{1/\hat k}.\]

1.5.3.5 Summary for Weibull MLE

For the Weibull distribution:

the MLE does not generally have a closed-form solution for both parameters together;
for a fixed \(k\), the MLE of \(\lambda\) is explicit: \[\hat\lambda= \left(\frac{1}{n}\sum_{i=1}^n T_i^k\right)^{1/k};\]
the MLE of \(k\) must usually be obtained numerically.

1.5.4 Special Case: Exponential Distribution as a Weibull Model

Recall that the exponential distribution is a special case of the Weibull distribution with shape parameter \[k=1.\] Indeed, when \(k=1\), the Weibull density becomes \[f(t;\lambda,1)=\frac{1}{\lambda}\exp\left(-\frac{t}{\lambda}\right),\] which is an exponential distribution with mean \(\lambda\).

Under this parametrization, the MLE of \(\lambda\) is \[\hat\lambda=\bar T.\] This agrees with the exponential result above if we use the rate parametrization \(\lambda_{\text{rate}}=1/\lambda\).

1.5.5 A Note on Censored Data

The likelihood calculations above assume that all event times are fully observed. In survival analysis, however, data are often censored. In that case, the likelihood must be modified.

For right-censored data, if we observe \[Y_i=\min(T_i,C_i), \qquad \Delta_i=I(T_i\le C_i),\] then each subject contributes \[f(Y_i;\theta)^{\Delta_i}S(Y_i;\theta)^{1-\Delta_i}\] to the likelihood. Therefore, the full likelihood becomes \[L(\theta)=\prod_{i=1}^n f(Y_i;\theta)^{\Delta_i}S(Y_i;\theta)^{1-\Delta_i}.\] This is the basic likelihood form used throughout parametric survival analysis with right-censored data.