Introduction to Competing Risk Model in the Epidemiological Research

Background and aims: Chronic kidney disease (CKD) is a public health challenge worldwide, with adverse consequences of kidney failure, cardiovascular disease (CVD), and premature death. The CKD leads to the end-stage of renal disease (ESRD) if late/not diagnosed. Competing risk modeling is a major issue in epidemiology research. In epidemiological study, sometimes, inappropriate methods (i.e. Kaplan-Meier method) have been used to estimate probabilities for an event of interest in the presence of competing risks. In these situations, competing risk analysis is preferred to other models in survival analysis studies. The purpose of this study was to describe the bias resulting from the use of standard survival analysis to estimate the survival of a patient with ESRD and to provide alternate statistical methods considering the competing risk. Methods: In this retrospective study, 359 patients referred to the hemodialysis department of Shahid Ayatollah Ashrafi Esfahani hospital in Tehran, and underwent continuous hemodialysis for at least three months. Data were collected through patient’s medical history contained in the records (during 2011-2017). To evaluate the effects of research factors on the outcome, cause-specific hazard model and competing risk models were fitted. The data were analyzed using Stata (a general-purpose statistical software package) software, version 14 and SPSS software, version 21, through descriptive and analytical statistics. Results: The median duration of follow-up was 3.12 years and mean age at ESRD diagnosis was 66.47 years old. Each year increase in age was associated with a 98% increase in hazard of death. In this study, statistical analysis based on the competing risk model showed that age, age of diagnosis, level of education (under diploma), and body mass index (BMI) were significantly associated with death (hazard ratio [HR] = 0.98, P < 0.001, HR = 0.99, P < 0.001, HR = 2.66, P = 0.008, and HR = 0.98, P < 0.020, respectively). Conclusion: In analysis of competing risk data, it was found that providing both the results of the event of interest and those of competing risks were of importance. The Cox model, which ignored the competing risks, presented the different estimates and results as compared to the proportional sub-distribution hazards model. Thus, it was revealed that in the analysis of competing risks data, the sub-distribution proportion hazards model was more appropriate than the Cox model.


Introduction
The chronic kidney disease (CKD) is known as a global health problem.This disorder gradually leads to end stage of renal disease (ESRD).The ESRD is defined as reduction of the amount of glomerular filtration rate (GFR) to less than 15 ml/m /per 1.73 m 2 . 1 In the absence of national registries, no reliable data are available on the incidence and prevalence of ESRD in Iran.Due to an increase in life expectancy in Iranian context, the contribution of ESRD related to the death factors is increasing.The prevalence and incidence of patients with ESRD based on the latest available statistics for the year 2006 were as follows: 434.83 and 63.8 in each million/ year as well as the annual rate of the disease at the end of that year which was estimated between 4% and 5%. 2 The survival of ESRD patients is lower than that of the general population. 3The classic example is to consider the various causes of death.If a subject dies of one particular cause he/she is no longer at risk of death owing to any other cause.Two specific examples in the nephrology world that require competing risk analysis are comparisons of mortality risk in patients with CKD compared with the mortality risks of those patients who have developed ESRD, since the risk of mortality is different for patients who reach ESRD as compared to those who do not progress to it. 4In the traditional analysis of this data, most researchers were interested in knowing the distribution of observed survival times in a particular refractive index and all other factors as censored data.
In cohort studies and clinical trials, the incidence of an event is often investigated.Competing risk data occurs when the subjects under study have more than one single event, such as death, from various causes.The term competing risks also refer data in which different events may not be mutually exclusive but interest in the first event occurs 5 the most common of which are the competing risks of relapse and death in remission.These two risks are the primary reasons that patients fail treatment.In most epidemiological papers the effects of covariates on three outcomes (relapse, death in remission, and treatment failure) are modeled by distinct proportional hazards regression models.Standard survival analyses might overestimate the rate of the event, especially when the rate of the competing risk is high.The presence of competitive risks is time-totime analysis and standard survival analysis is not always appropriate and thus should be interpreted with care. 6n addition, competing risks refer to situations in which different types of events might occur.For example, when studying death on dialysis, receiving a kidney transplant is an event that competes with the event of interest.Thus, a competing risk is an event that either prevents the event from being viewed or changes the chance of the event. 7The aim of this study was to demonstrate the practicality of competing risk models, to estimate the survival of ESRD patients using competing risk analysis, and also to compare its results with other commonly used approaches.

Methods
We performed a retrospective study among all ESRD patients older than 20 years who were registered (during 2011 to 2017) in the hemodialysis department of Shahid Ayatollah Ashrafi Esfahani hospital, Tehran, Iran.Patient were included in the study when renal replacement therapy such as hemodialysis was initiated.Those patients who died within three months following the onset of dialysis were excluded from the study.The collected data included information and characteristics, clinical characteristics, transplantation history, hemodialysis, and also the cause of the end-stage renal disease.The event of interest were death and transplantation.Myocardial infarction, stroke, cancer, hepatitis C, accidents, sepsis, cardiac and respiratory arrest, seizures, diabetic foot, and embolism were considered to be as competing risks.Gender, age, body mass index (BMI), blood markers, blood pressure, diabetes, smoking, education level, marital status, and blood group were considered to be independent variables.Two different models were used to fit the data.In the first model, data were analyzed applying the competing risk approach as explained by Fine and Gray.They proposed the direct use of a regression model on a cumulative incidence function.The second model that was employed to fit the data was a specific hazard model.In the following sections, a brief introduction is provided regarding the concepts of these models.

Statistical Methods for the Analysis of Survival Data in the Presence of Competing Risks Gray and Fine Models
The model proposed by Fine and Gray is based on the hazard of the sub-distribution and provides a simple relationship between covariates and cumulative incidence. 8In this model, the effect of independent variables, in contrast to the direct cause-specific model, is examined on the cumulative incidence function.Therefore, the effect of an independent variable in this model would be very different from its effect in causespecific model. 9In other words, this model seeks to model the cumulative incidence function.Given the definition of the cumulative incidence function as the probability of the given incident occurrence before the specified time T and prior to the occurrence of the competing incidents, by defining the cumulative incidence function for each incident, for example, for the first incident, this model will be as follows: 4 cancer, hepatitis C, accidents, sepsis, cardiac and respiratory arrest, seizures, diabetic foot, and embolism were considered to be as competing risks.Gender, age, body mass index (BMI), blood markers, blood pressure, diabetes, smoking, education level, marital status, and blood group were considered to be independent variables.Two different models were used to fit the data.In the first model, data were analyzed applying the competing risk approach as explained by Fine and Gray.They proposed the direct use of a regression model on a cumulative incidence function.The second model that was employed to fit the data was a specific hazard model.In the following sections, a brief introduction is provided regarding the concepts of these models.

Gray and Fine Models
The model proposed by Fine and Gray is based on the hazard of the subdistribution and provides a simple relationship between covariates and cumulative incidence. 8In this model, the effect of independent variables, in contrast to the direct cause-specific model, is examined on the cumulative incidence function.Therefore, the effect of an independent variable in this model would be very different from its effect in cause-specific model. 9In other words, this model seeks to model the cumulative incidence function.Given the definition of the cumulative incidence function as the probability of the given incident occurrence before the specified time T and prior to the occurrence of the competing incidents, by defining the cumulative incidence function for each incident, for example, for the first incident, this model will be as follows: where F1 denotes the cumulative incidence function of the first occurrence, Z represents the independent variables vector, and ԑ = 1 indicates the first occurrence.In addition, the sub-hazard model of this occurrence is provided as: where F 1 denotes the cumulative incidence function of the first occurrence, Z represents the independent variables vector, and ԑ = 1 indicates the first occurrence.In addition, the sub-hazard model of this occurrence is provided as: 4 cancer, hepatitis C, accidents, sepsis, cardiac and respiratory arrest, seizures, diabetic foot, and embolism were considered to be as competing risks.Gender, age, body mass index (BMI), blood markers, blood pressure, diabetes, smoking, education level, marital status, and blood were considered to be independent variables.Two different models were used to fit the data.In the first model, data were analyzed applying the competing risk approach as explained by Fine and Gray.They proposed the direct use of a regression model on a cumulative incidence function.The second model that was employed to fit the data was a specific hazard model.In the following sections, a brief introduction is provided regarding the concepts of these models.

Gray and Fine Models
The model proposed by Fine and Gray is based on the hazard of the subdistribution and provides a simple relationship between covariates and cumulative incidence. 8In this model, the effect of independent variables, in contrast to the direct cause-specific model, is examined on the cumulative incidence function.Therefore, the effect of an independent variable in this model would be very different from its effect in cause-specific model. 9In other words, this model seeks to model the cumulative incidence function.Given the definition of the cumulative incidence function as the probability of the given incident occurrence before the specified time T and prior to the occurrence of the competing incidents, by defining the cumulative incidence function for each incident, for example, for the first incident, this model will be as follows: where F1 denotes the cumulative incidence function of the first occurrence, Z represents the independent variables vector, and ԑ = 1 indicates the first occurrence.In addition, the sub-hazard model of this occurrence is provided as: It will be considered as hazards in the following form: It will be considered as hazards in the following form:  1 (; ) =  10 (){  } where β is the vector of regression coefficients and λ10(t) displays a nonparametric estimation of the baseline hazard.Combining the two recent equations, the cumulative incidence function for the first incident will be as follows: where β is the vector of regression coefficients and λ 10 (t) displays a nonparametric estimation of the baseline hazard.Combining the two recent equations, the cumulative incidence function for the first incident will be as follows: It will be considered as hazards in the following form:  1 (; ) =  10 (){  } where β is the vector of regression coefficients and λ10(t) displays a nonparametric estimation of the baseline hazard.Combining the two recent equations, the cumulative incidence function for the first incident will be as follows: In this model, the nonparametric estimation of the cumulative incidence functions, of course, is calculated with some changes relative to the cause-specific model.The difference between this model and the cause-specific model is the number of individuals at risk for the given incident.In the cause-specific model, the number of individuals at risk is reduced each time a person experiences the rest causes.But in this model, in which the person is experiencing other causes, it is assumed at the risk remained that the time for these individuals has been censored which is longer than all the times of the given incident.This leads to the cumulative incidence function's association with only one's own hazard function for a specific cause and not the rest of the hazards.

Cumulative Incidence Function
A second statistical method in competing risks is the cumulative incidence function.This function is directly estimable from the data without a need for making any distributional assumptions.Standard survival analysis methods have been commonly used to analyze the competing risks data.However, sometimes, inappropriate methods such as the complement of Kaplan-Meier estimate (1-KM) has been applied to estimate the probabilities of the occurrence of an event of interest in a competing risks setting 10,11 .
The 1-KM cannot be interpreted as the actual probability of the occurrence of an event by time t 6 .In this classic analysis, there is a favorite event and all other events are censored.The assumption of this method is that of non-informative censoring which is based on the idea that censored patients are more likely to experience the event as follow-In this model, the nonparametric estimation of the cumulative incidence functions, of course, is calculated with some changes relative to the cause-specific model.The difference between this model and the causespecific model is the number of individuals at risk for the given incident.In the cause-specific model, the number of individuals at risk is reduced each time a person experiences the rest causes.But in this model, in which the person is experiencing other causes, it is assumed at the risk remained that the time for these individuals has been censored which is longer than all the times of the given incident.This leads to the cumulative incidence function's association with only one's own hazard function for a specific cause and not the rest of the hazards.

Cumulative Incidence Function
A second statistical method in competing risks is the cumulative incidence function.This function is directly estimable from the data without a need for making any distributional assumptions.Standard survival analysis methods have been commonly used to analyze the competing risks data.However, sometimes, inappropriate methods such as the complement of Kaplan-Meier estimate (1-KM) has been applied to estimate the probabilities of the occurrence of an event of interest in a competing risks setting. 10,11The 1-KM cannot be interpreted as the actual probability of the occurrence of an event by time t. 6In this classic analysis, there is a favorite event and all other events are censored.The assumption of this method is that of non-informative censoring which is based on the idea that censored patients are more likely to experience the event as followup patients.However, this assumption has not been confirmed in the presence of numerous competing results. 12These estimates have been interpreted as the probability of an ideal event in the ideal world in which there are no other types of events. 6,8Although, in the presence of competing risks, each event has a hazard.Therefore, the number of failures from the competing risks will reduce the actual number of failures from the event of interest and consequently, influence the estimate of the probability of failure from this event.In these situations, the cumulative incidence function is the appropriate tool for analyzing such data.Cumulative incidence function for a specific event, also known as the sub-distribution function, is defined as the probability of failing from a given cause in the presence of competing events, given that a subject has survived or has already failed due to different causes. 6,11In the present study, the estimate of the cumulative incidence for a specific event was simultaneously calculated based on the estimate of the overall survival function when all types of events are considered and on the hazard estimate of the specific event.Besides, the cumulative incidence function for a specific event depends not only on the number of individuals who have experienced this type of event, but also on the number of those who have not experienced any other event. 11This function is often of interest in epidemiological research and its graphical display over the time is intuitive and appealing. 11,13,14To analyze the differences in cumulative incidence between various patient groups, Gray's test was used.Comparing the cumulative incidence functions gives an idea of the probability of occurrence of the event of interest, and therefore can be translated into an actual number of patients with the event of interest. 11

Characteristics of Patients
Of the 400 hemodialysis patient participants, 41 (10.25%) of them were excluded from the study because they lacked complete data on all the predictors.Thus, the samples available for the analysis consisted of 359 patients.The participants included in these analyses had a mean age of 58.93 years and 230 (64.07%) of them were males.The sex ratio was 1.78.The median duration of followup was 3.12 years and mean age at ESRD diagnosis was 66.47 years.The baseline characteristics of our study are shown in Table 1.A total of 123 (34.26%) patients died.The findings regarding the dead patients of the center showed that of 122 dead patients, 72 (31.30%) of them were males and 50 (38.769%) of them were females.The most common blood type was A. In this study, the majority (76.32%) of people (n = 274) were married.

Comparison of Different Regression Models
The method of analysis resulted in markedly different estimates.In this study, statistical analysis, based on the competing risk model, showed that age, age of diagnosis, level of education (under diploma), and body mass index were significantly associated with death (hazard ratio (HR) = 0.98, P < 0.001, HR = 0.99, P < 0.001, HR = 2.66, P = 0.008, and HR = 0.98, P < 0.020, respectively).Each year increase in age was associated with a 98% increase in hazard of death (Table 2).

Discussion
Competing risks are prevalent most epidemiological researches.Failure to correctly account for competing events can result in adverse consequences including probability overestimation of the occurrence of the event and magnitude mis-estimation of relative effects of covariates on the incidence of the outcome.When estimating crude incidence of the outcome of interest, it is inappropriate to use the complement of Kaplan-Meier survival function because this will lead to an overestimation of the incidence of the outcome of interest when competing risks are present.It is important to provide the results for all causes and also for both cause-specific as well as and sub-distribution hazard functions.Our findings showed that the women with ESRD had a better survival rate than men.However, the difference was not statistically significant.Similar to a study by Tabrizi et al. 1 in our study, the majority of patients were hypertension (nearly 26%).In contrast, cumulative incidence was grossly overestimated by standard survival analysis.The Cox proportional regression approach requires a proportionality assumption.Our study showed that the estimates of the covariate coefficients were different on the cause-specific and also sub-distribution hazard models.The cause-specific hazard can be modeled using the Cox model which is broadly used in epidemiological research.These risk factors are often missing from competing risk analyses. 15

Limitations of the Study
This study had some limitations which need to be acknowledged.The most important limitation was the lack of information about incidence and prevalence of ESRD in Iran.As a result, it is suggested that further studies to be conducted, using more complete information, to estimate the incidence and prevalence in Iranian ESRD patients.The researchers did not collect follow-up blood pressure, medications as well as proteinuria information and thus, they cannot comment on the time dependence of the outcomes on these risk factors.Therefore, similar studies focusing on the above-mentioned issues are subject to further investigation.

Conclusion
In the presence of competing risk outcomes, Kaplan-Meier estimates are biased as they overestimated the probability of the occurrence of an event of interest.In this paper, 2 common methods have been discussed for handling competing risks and their applications in regression settings.

Ethical Approval
This study was obtained from a master's thesis in epidemiology, which was conducted by a grant (No.

Table 1 .
Characteristics of the Patients

Table 2 .
Comparison of Different Regression Models Based on Significant Variables