# Networks overview

Networks are a powerful way of visualising and interpreting the relationships between a set of markers (typically protein expression). TMA Navigator provides for examination of all pairwise relationships in a set of markers using a choice of statistical measures: mutual information, Spearman's correlation or Pearson's correlation. For each marker pair examined, a strength of correlation (the test statistic) and a p-value are identified. P-values are adjusted for multiple hypothesis testing, and interactive thresholding is available based on statistical significance and the test statistic (e.g. Pearson's r). The thresholded relationships are displayed as an interactive network - click for demonstration - where markers (nodes) are displayed as rectangles and significant relationships (edges) drawn as lines that connect pairs of markers.

Networks of up to 50 markers can be displayed in the web browser - tissue microarray datasets rarely include more markers than this, and visualisation of large networks can lead to the web browser crashing. Although networks of more than 50 markers are not displayed, the option is given to export the network for analysis using external tools. A warning is displayed before showing networks of 30-49 markers, which alerts the user of the longer loading time necessary for larger networks. This is particularly relevant on older computers and/or web browsers.

Mutual information networks require data on at least six markers due to the sampling process used for estimating statistical significance. Other networks require data on a minimum of three markers.

**Figure 1:** An example protein network with edges representing significant Spearman correlation relationships

TMA Navigator provides several options for network inference, as follows:

- Mutual information network (KDE)
- Mutual information network (discrete)
- Spearman's correlation network
- Pearson's correlation network

## [Top]Accessing and interpreting network results

Network analysis results are listed on the dataset page in the *Completed analyses* table under the heading *Networks*. To view the results of an analysis from the dataset page, click on the name of the analysis.

The networks results page consists of the network itself, displayed on the left-hand side, with a control panel that has features for manipulating and exporting the network. Markers are represented as nodes in the network, and significant relationships are shown as edges. The colour of nodes represents the number of connections (i.e. degree) on a blue through grey to orange scale, orange being highest. Edge thickness indicates significance; thicker edges are more significant relative to the edges displayed at the chosen significance threshold. Positive and non-linear edges are displayed in light grey, negative edges in dark red. For an example, see a demonstration network.

### Adjusting the network view

**Figure 2:** Network view controls

Markers can be moved by clicking and dragging them with the mouse. The network can be moved and resized using the network view controls (figure 2, above). To move the whole network within the window, use the *panning control*. Alternatively, click the *grab to pan* icon and click and drag the network view. When complete, click the *grab to pan* icon again to turn panning off. To fit the network to fill the available area, click the *fit to screen* icon. To zoom the network in and out, use the *zoom slider* or the *zoom in*/*zoom out* buttons.

The network view control panel itself can be moved around by clicking and dragging it, avoiding clicking on any specific control.

### Network thresholding

**Figure 3:** Network thresholding

The network thresholding panel contains a sliding bar to specify the p-value threshold that controls which of the possible connections (edges) between markers are displayed. Dragging the slider marked *p-value threshold* adjusts the significance threshold. Lowering the p-value threshold acts to reduce the number of network edges displayed, and is more stringent. Typical choices of p-value threshold are 0.05 (the default) and 0.01. Also, the p-value threshold slider controls the minimum edge value (e.g. Pearson's r) included the network, this value can be identified by clicking on the thinnest edge. In addition to a significance threshold, it can be useful to determine a minimum test statistic value (e.g. Spearman correlation > 0.5) to control for the strength of the relationships included in the network. The number of edges passing the threshold is displayed at the bottom of the network thresholding box. The network can be viewed unthresholded by adjusting the slider to a p-value of 1, which will result in all markers becoming connected with each other.

The *multiple hypothesis testing correction* section has two buttons to specify the procedure, either Benjamini-Yekutieli FDR [Benjamini and Yekutieli 2001 Ann. Statist. 29:1165] (default) or Bonferroni, which is a more stringent correction method that results in fewer significant edges in the network. For most circumstances, Bonferroni correction is overly conservative.

For mutual information (KDE and discrete), p-values are estimated using 100,000 permutations of the scores in the TMA, therefore there is a sensitivity limit for raw (uncorrected) p-values at 1 x 10^{-5} for typical datasets (e.g. with 10 proteins) translating into a limit for false discovery rate corrected p-values of approximately 3 x 10^{-5} (Benjamini-Yekutieli correction) or 1 x 10^{-4} (Bonferroni correction). Smaller p-values have a larger relative margin of error.

Significance estimation for Spearman's correlation networks uses the AS 89 algorithm [Best & Roberts 1975 J. Roy. Stat. Soc. C App. 24:377] where less than 1290 samples are available; otherwise the (standard) asymptotic *t* approximation is used. Significance estimation for Pearson's networks uses the standard asymptotic *t* approximation.

### Network layout

**Figure 4:** Network layout

The network layout panel has options to specify the spatial organisation of the network. Different layouts are selected by clicking the buttons *Force directed*, *Circular* and *Radial* corresponding to three different algorithms (more information on layouts). Force directed and radial layouts are stochastic, therefore a different layout will be produced each time they are applied. Re-clicking on the relevant button results in the layout method being re-applied.

### Information on nodes and edges - the selections panel

**Figure 5:** The selections panel: a) information about a selected marker and b) information about a selected edge

Clicking on a marker or edge displays information in the selections panel on the lower right hand side of the network. Clicking on a marker displays its direct neighbours at the current significance threshold. Clicking on an edge displays the relevant test statistic value (i.e. mutual information, Spearman's rho, Pearson's r) as well as the estimated p-value of that edge. More information about estimation of p-values is given in the network thresholding section.

### Legend

**Figure 6:** Network legend explaining the colour scheme for markers and edges. The legend is viewed by placing the mouse cursor over the word **Legend**. This example is for networks with signed edges (Pearson and Spearman correlation).

To view a legend for the network explaining the colour scheme for markers and edges, roll the mouse cursor over the word **Legend** (figure 6).

Markers are coloured based on their degree, or number of connections, on a blue (low) to orange (high) scale. In mutual information networks, all edges are coloured grey. In signed networks (Pearson or Spearman correlation), edges are coloured red for a negative relationship, or grey for a positive relationship.

### Exporting the network

**Figure 7:** Network export options, shown in the bottom right of the network view. The options are *GraphML* to export the network for use with external tools, and *PNG image* and *SVG image* which save the current view as a graphic.

The network can be exported in GraphML format for further analysis, or saved as an image. Clicking the **GraphML** button will export the complete, unthresholded network. GraphML can be imported by a variety of tools for further visualisation and analysis, including Cytoscape (version 3 or higher; a plugin can be used with version 2.8), Biolayout and R.

The network view can be saved as an image in PNG or SVG formats. PNG is bitmapped, so quality will degrade under significant resizing, but is most compatible. SVG is a vector format and can be resized without loss of quality, making it the best choice for posters and publications. Click the **PNG image** or **SVG image** button to begin; the current view is converted to an image (may take a few moments) which can be saved to disk.

### Interpretation of networks

Edges in a protein correlation network represent relationships in protein concentration between a pair of proteins. Edges between a protein marker and a clinical measurement (e.g. number of lymph node metastases) imply a correlative relationship between the marker and the clinical measurement. Highly correlated markers (indicated by a small p-value, or large test statistic) are more likely to be functionally related than those with less significant connections. It is important to note that the absence of an edge does not indicate the absence of a relationship - rather, it indicates a relationship at least as strong as the one observed is expected to occur by chance at a frequency determined by the threshold applied, e.g. 5% of the time at p=0.05. Markers with many significant connections, called hubs, are typically important components of the network studied.

When interpreting networks in TMA Navigator, it is useful to determine the smallest test statistic passing the significance threshold. This is identifiable by clicking on the thinnest edge in the network, displayed in the selections panel. For example, application of a minimum correlation strength can allow for conservative thresholding (e.g. Spearman's rho ≥ 0.8).

Network topological features such as clusters can identify a functional neighbourhood for genes of interest and predict function according to guilt-by-association. For example, a highly connected subgraph chiefly consisting of adhesion proteins could be termed an 'adhesion cluster' - the other proteins within the adhesion cluster would be predicted to have a function that relates to adhesion.

As noted in the section above, networks can be exported as a GraphML file and then read by tools with rich network visualisation/analysis functionality such as Cytoscape (version 3 or higher; a plugin can be used with version 2.8), Biolayout and R. For example, these tools enable calculation of graph theoretic properties relevant to individual nodes such as clustering coefficient (which gives the local density of the network) and betweenness centrality (briefly, measuring the node's importance for network connectivity).

Several review articles are available to give further guidance on network interpretation (e.g. [Sharan, Ulitsky & Shamir 2007 Mol. Syst. Biol. 3:88], [Vidal, Cusick & Barabási 2011 Cell 144:986]).

Back to top