Survival Analysis for Clinical Studies

Key words: Survival analysis/Censored data/Kaplan-Meier survival curves/Cox proportional hazards model Aim: This paper focuses on the use of censored data in survival analysis. Survival analysis is used most frequently in the case of cancer patients when the study is fi nished and a number of individuals are still alive. The original article cited 2 was declared recently to be the most cited statistical study in the biomedical area. The goal of this paper is to explain the basic principles and methods involved. The way survival analysis processes data and interprets outputs is presented using the clinical data of oncological patients. Methods and Results: Survival analysis is used to estimate survivor function from survival data, to compare survivor functions and to assess the relationship of explanatory variables to survival time. These methods were applied to the data of 176 patients with heamato-oncological diagnoses who had undergone bone marrow blood transplant. Conclusion: It is very important to use appropriate methods when processing statistical data. Standard statistical procedures used for incomplete data could not provide the correct estimation.


INTRODUCTION
Of statistical methods applied to data evaluation in biomedical research, survival analysis has been proving to be increasingly signifi cant since the 1950s.This method mines a maximum of relevant information from studies fi nished at a time when the collected data has not been completed.A typical example is a study in which the dependent variable is the survival or the length of life of cancer patients.However, from the literature and from biometrical experience, survival analysis has also been applied to materials, machinery and electrotechnical products as well (material fatigue) as human patients.This is attested to in the fi rst two articles on this topic: the fi rst author focused on cancer studies while the second on the lifespan of electronic tubes.

MATERIAL AND METHODS
The statistical department of the Institute of Medical Biophysics has been applying survival analysis for more than two decades.The impulse to begin using the method was a requirement for analysis of incomplete survival data obtained mostly from oncological patients.These data were collected by physicians (eg. in collaboration with a team of the 3 rd Internal clinic headed by prof Ščudla) and haematooncologists (headed by prof Indrak).Generally survival analysis is a collection of methods that process the variable of "time to the occurrence of an event".This variable is also often called "time of survival" or briefl y "survival".The "survival time" refers to a number of years, months, weeks or days from the beginning of the patient observance till the occurrence of an observed event (death as the rule).Most studies on survival analysis are terminated before the observed event (end-point) occurs for all subjects.This situation is called in survival analysis "censoring".There can be three reasons for censoring: 1. the observed event does not occur for an object till the end of a study 2. an object was eliminated from study or decided to abandon the study 3. an object is lost from observation before the end of the study (eg. he died for the reason other than that of the event under study) Biomedical examples which can be analyzed by the survival method: • length of the patient remission with acute leukaemia till relapse occurrence or the time to death from the beginning of the remission.• survival of a female patient with mammary carcinoma with various immunohistochemical response.• time of eff ect of analgesic after kidney surgery in infants and toddlers to the application of opiates in the case of pain manifestation • time to re-infection in case of patients suff ering from sexually transmitted disease.
After fi nishing a clinical study we obtain input data for subsequent statistical processing.Usually there is a group of patients with an observed diagnosis available.
The group consists of three subgroups of individuals: 1. patients for which an event occurred (e.g.death)fi nal data 2. patients for which the event did not occur -censored data 3. patients who were eliminated from a study for some reason or who cannot be observed further -also censored data.

K. Langova
There are two items of data for each patient: 1. survival time (means period of individual observation; also used for censored data) 2. reason why the individual observation was terminated -coded as 1 for the observed event, 0 for patients from groups 2 and 3.
In our study the subject of analysis is the data from 176 patients with heamatooncological diagnoses who have undergone bone marrow blood transplant and then a number of patients suff ering from acute graft-versushost disease (GvHD) which is a common side-eff ect of the transplant.
Data of the study are in Table 1: The goals of survival analysis are to: 1. estimate and interpret survivor function from survival data 2. compare survivor functions 3. assess the relationship of explanatory variables to survival time.

Survivor function
Survivor function describes the probability that a person lives longer than a specifi ed time t since the beginning of observation (S(t) = P(T > t), T is a nonnegative random variable which spans the time from the beginning of an observation to the occurrence of the observed event -survival time).

Comparison of survivor functions
A common problem in clinical studies is to compare two or more survivor functions.There are a few statistical tests for such a comparison.The most used tests are the log rank test and generalized Wilcoxon test (also called Breslow test).
The log rank test is in fact a chi-squared test for a large sample.The log rank statistic compares the observed with an expected number of events.The expected number of events is calculated by the method assuming that the null hypothesis is true.The null hypothesis assumes that the compared curves are the same.The comparison is performed at every time point the observed event occurred.
The Wilcoxon test does not work with real times to an occurrence of the observed event but only with a rank of events'occurrence in two groups.Each ranked item of one group is compared with all ranked values from a second group.This eliminates the eff ects of extreme times from input groups.

The Cox proportional hazards model
The Cox proportional hazards (PH) model is used to test the eff ect of a number of variables or factors on survival and hazard an observed event occurrence.This multiple regression method reveals statistically signifi cant factors and also quantifi es the eff ect of all factors on survival.
A suffi cient number of observations, no correlation between two or more independent variables and independence of observations are general assumptions of regression methods.In addition the Cox PH model further requires that the eff ects of the predictor variables are constant over time.
Input data for the Cox PH model: 1. time to occurrence of an observed event (dependent variable) 2. reason for terminating observation 3. independent variable predictors -factors (can be both quantitative and categorical) The Cox PH model (for p independent variables X 1 , …, X p is described by the equation:

Survivor function
In practice, survivor function is estimated using the Kaplan-Meier method (K-M method).S(t) denotes a theoretical survivor function that holds true for the whole population and Ŝ(t) means a survival function for a given sample or study group.Values of the function Ŝ(t) are Graph 2. The real shape of survivor functions for the two groups.
calculated in all times when the observed event occurred for any patient.The Kaplan-Meier method provides very good estimations of survival probabilities also in such cases when the survival distribution is unknown.This method assumes that the pattern of censoring is independent of the survival times.
The real shape of survivor function for both groups observed in our study is shown in graph 2. The horizontal axis indicates time.The vertical axis displays survival probability of a time ranging from 0 to 1.
The SPSS application is not able to display the 95% confi dence interval for survivor functions.To solve this problem we cooperated with a programmer and designed and implemented a new software program in our department.The program is written in C++ language using MFC libraries and was developed in Visual Studio environment from Microsoft.An output from this program is shown in graph 3.
When survival probability is calculated, the corresponding statistical method will also provide us with medians and means for survival time.Median for survival time is the time at which the survival probability equals 0.5.Means for survival time is defi ned as the area under the survivor function curve.
Graph 3. The real shape of survivor functions for the two groups with 95% confi dence intervals.

Age of host years
Age of donor years CD34+ (number of stem cells) 10 6 per 1 kg Table 2 shows the output from the SPSS program with calculated estimates, standard errors and 95% confi dence intervals of means and medians (if possible) for both observed groups and the overall group.
From the table it is obvious that the patients with acute GvHD had lower values of survival time.

Comparison of survivor functions
The output from the SPSS program in table 3 shows level of signifi cance from a corresponding comparison of the survivor function curves of both observed groups.

Cox PH model
In our study we evaluated the eff ect of selected factors on survival and hazard of an observed event occurrence (death).Table 4 displays an overview of categorial and quantitative factors using the Cox PH model: There were 6 models designed with the stepwise regression model method (this is a method of a gradual addition of factors to a model) -see steps 1 to 6. B coeffi cients and HR (a change of hazard for a unit change of a factor value) were estimated for the factors assigned to the model.If an acute GvHD developed after the transplant then the death hazard after the bone marrow transplant increases 3,29 times.(95 % CI 2.03-5.33) If chronic GvHD developed after the transplant then the death hazard after the bone marrow transplant decreases 0,2 times.(95 % CI 0.11-0.35) The presence of IL9174 G allele in a donor increases the death hazard after the bone marrow transplant 2,433 times.(95 % CI 1.26-4.71)

CONCLUSIONS
We wish to issue a caveat on the dangers of statistical misinterpretation of data when applying standard statistical procedures in the case of incomplete (censored) data.For example a computation of means or medians without respecting data incompleteness would lead to an underestimation of the reality (in terms of real patient survival).
t) is the hazard function, h 0 (t) is called the baseline hazard function (the expected hazard without any eff ect of the considered factors), e is a base of the natural logarithm, B 1 , …, B p regression coeffi cients.The expression h(t)/h 0 (t) is called the hazard ratio (HR) and indicates a growth or decrease of hazard caused by an eff ect of factors X 1 , …, X p .Graph 1. Theoretical shape of survivor function Survival analysis for clinical studies Calculation of the Cox regression equation is included in most statistical software applications.The coeffi cients B 1 , …, B p and their standard errors SE are estimated using complex mathematical methods and the signifi cance of these coeffi cients in the Cox PH model is tested with the Wald statistic.If the Wald test statistic probability is 0.05 or lower we consider the independent variable as an eff ective predictor of the observed event.The expression i B e (Exp(B i )), i = 1, …, p estimates a percentage change of the observed event hazard (e.g.death) for unit change of the independent variable.In Cox PH model output we detect in particular if the examined predictors are statistically signifi cant and if they are then we interpret HR for them.

Table 1 .
Input data for survival analysis

Table 2 .
Means and medians for survival time; medians for the group without acute GvHD could not be calculated because the survival probability at the end of observation was more than 0.5.

Table 3 .
Overall comparisons.Almost identical results of both statistical tests (in both cases p < 0.05) show us that the patients with acute GvHD have a worse survival.

Table 4 .
Factors selected for the regression model

Table 5 .
Variables in the equation