# Survival analysis overview

Survival analysis involves statistical testing to examine relationships of marker scores with survival. Patients are split into groups; the difference in survival between the groups is plotted graphically and tested for significance. A detailed introduction to the subject is given in the article [Clark et al. 2003 BJC 89:232].

**Figure 1:** Example Kaplan-Meier plot

A commonly used approach to investigate patient survival is Kaplan-Meier analysis (Figure 1) [Kaplan & Meier 1958 J. Amer. Statist. Assoc. 53:457]. Samples are grouped according to marker expression scores, and survival over time is plotted for each of these groups. The difference between groups can be tested using the log-rank test, also known as the Mantel-Cox test [Mantel 1966 Cancer Chemother. Rep. 50:163]. Since the statistical test is performed on each marker in the dataset, multiple hypothesis correction is applied to each p-value from this test, using the Benjamini-Hochberg false discovery rate (FDR) procedure [Benjamini & Hochberg 1995 J. Roy. Statist. Soc. Ser. B 57:289].

Each line on a Kaplan-Meier plot represents the survival function of a group. The survival function can represent different events such as disease recurrence or death. Survival time (months) is on the x-axis, and the proportion of the group surviving is on the y-axis. All lines will start off at 1.0 on the y-axis, unless any patients experienced an event (e.g. relapse, death) at the start time. A line will drop vertically if a patient within the group experiences an event.

Censoring is indicated with a '+', which means that a patient survived up until the point denoted, but it is not known what happened after that time.

## [Top]Available analyses

A list of available analyses in this category is given below.

## [Top]Accessing and interpreting survival analysis results

In TMA Navigator, survival analysis results are accessed from the dataset page, under the heading *Survival analysis* in the *Completed analyses* table. Click on the analysis name given on the dataset page in order to view the results.

Survival analysis results are presented in three main sections (see Figure 1, above). There is a line of tabs lies horizontally across the top, listing the names of the markers in the dataset. To view the results for a marker, click on the tab with its name - which will display a Kaplan-Meier plot on the left (for the selected marker), and a panel on the right-hand side gives descriptive statistics. At the top of the panel there is also an 'Open in new window' button that displays the Kaplan-Meier plot in a separate window. Key aspects for interpreting the Kaplan-Meier plot are the separation of patient groups and the log-rank p-value. Markers that show strong separation of groups may be useful for risk stratification, if the log-rank p-value does not meet standard significance thresholds then additional data should be obtained to provide confidence that the separation observed is not due to chance. Common statistical significance thresholds are FDR *p* <0.05 and <0.01. It is worth bearing in mind that a statistically significant result may not provide sufficient separation of groups to be practically useful. The descriptive statistics (e.g. maximum, minimum values) given in the right hand panel provide for identification of the patient groups in downstream analyses. Further information, including an introduction to survival analysis is given in the following article [Clark et al. 2003 BJC 89:232].

For more information on the plot itself, see the survival analysis overview (above).

The buttons on the right-hand side are:

**Open in new window**- Open the current plot in a new window. Useful for comparing multiple plots side-by-side.**Show mixture model**-*(Kaplan-Meier mixture model plots only)*Opens the mixture model used to define sample groups in a new window.**Download groups (.tsv)**-*(Kaplan-Meier tertile and mixture model plots only)*Download the list of samples for the current marker along with a list of scores and group assignments in tab-separated value (.tsv) format. Mixture models also contain the probabilities of samples belonging to each group.

The columns in the file are:**Sample**- The sample identifier as uploaded with the TMA data**Score**- The marker score value for this sample**Tertile***(tertile plots only)*- Which tertile the sample is in - 1, 2 or 3**MixtureGroup***(mixture models only)*- Which model group (i.e. which Gaussian) does the sample most likely come from, using maximum likelihood?**GroupNProb***(mixture models only)*- The probability of the sample belonging to group N for each of the groups, e.g. a two Gaussian model contains columns Group1Prob and Group2Prob.**SilhouetteWidth**- The silhouette width width is a measure of clustering quality on a scale from -1 to 1. Large values (almost 1) are very well clustered, around 0 means the observation lies between two clusters, and strongly negative values (close to -1) are poorly clustered. These values will be marked NA (not applicable) where mixture modelling results in a single cluster.

**Download as SVG image**- Download the current image in scalable vector graphic (SVG) format. SVG images can be rescaled to any size without loss of quality, ideal for posters and publications.

The statistics in the panel on the right-hand side are:

**Log-rank Chi-sq**- The log-rank chi-square value. This statistic is the result of the test for a significant difference in survival between the groups.**FDR corrected p-value**- The p-value corresponding to the chi-square value above. The p-value is probability of obtaining a separation between the groups at least as big as the one observed by chance. For example, at p=0.05, a separation at least as good as the one observed is seen by chance 5% of the time. The p-values are false discovery rate (FDR) corrected using the Benjamini-Hochberg procedure to account for multiple hypothesis testing Benjamini & Hochberg 1995 J. Roy. Statist. Soc. Ser. B 57:289].**Mean silhouette width**-*(Kaplan-Meier tertile and mixture model plots only)*The mean silhouette width provides a measure of clustering quality on a scale from -1 to 1. Large values (close to 1) indicate better clustering; values close to 0 or negative values indicate poor clustering. This value will be marked NA (not applicable) where mixture modelling results in a single cluster.**For each group:**The following statistics are listed:**n**- The number of samples that fall into this group.**Min**- The minimum marker score for samples in this group.**Median**- The median marker score for samples in this group.**Max**- The maximum marker score for samples in this group.