This example shows how Bioinformatics Toolbox™ can be used to work with and visualize graphs.
Graphs, in the sense of graph theory, are a mathematical way of representing connections or relationships between objects. There are many applications in bioinformatics where understanding relationships between objects is very important. Such applications include phylogenetic analysis, protein-protein interactions, pathway analysis and many more. Bioinformatics Toolbox provides a set of generic functions for working with and visualizing graphs.
The graph theory functions in Bioinformatics Toolbox work on sparse matrices. The only restriction is that the matrix be square. In this example, a graph was created from a SimBiology® model of a Repressilator [1] oscillatory network . In this model, protein A represses protein B, protein B represses protein C, which in turn represses protein A.
load oscillatorgraph
There are two variables: g
, a sparse matrix, and names
, a list of the names of the nodes of the graph.
whos g names
Name Size Bytes Class Attributes g 65x65 2544 double sparse names 65x1 7820 cell
If you have SimBiology you can create the graph using the following commands:
% sbioloadproject oscillator % class(m1) % Now get the adjacency matrix % [g,names] = getadjacencymatrix(m1);
There are many functions in MATLAB® for working with sparse matrices. The spy
function displays as * wherever there is a non-zero element of the matrix.
spy(g)
This gives some indication of the number of edges of the graph and also shows that the graph is not symmetric and, hence, is a directed graph. However, it is difficult to visualize what is going on. The biograph
object is another way of representing a graph in Bioinformatics Toolbox.
gObj = biograph(g,names)
Biograph object with 65 nodes and 123 edges.
The view
method lays out the graph and displays it in a figure.
gObj = view(gObj);
You can interact with the graph using the mouse. You can also programmatically modify the way that the graph is displayed.
% find the nodes pA, pB, and pC pANode = find(strcmp('pA', names)); pBNode = find(strcmp('pB',names)); pCNode = find(strcmp('pC', names)); % Color these red, green, and blue gObj.nodes(pANode).Color = [1 0 0]; gObj.nodes(pANode).Size = [40 30]; gObj.nodes(pBNode).Color = [0 1 0]; gObj.nodes(pBNode).Size = [40 30]; gObj.nodes(pCNode).Color = [0 0 1]; gObj.nodes(pCNode).Size = [40 30]; dolayout(gObj);
There are several functions in Bioinformatics Toolbox for working with graphs. These include graphshortestpath
, which finds the shortest path between two nodes, graphisspantree
, which checks if a graph is a spanning tree, and graphisdag
, which checks if a graph is a directed acyclic graph.
graphisdag(g)
ans = logical 0
There are also corresponding methods of the biograph object. These have names similar to the functions for working with sparse matrices but without the prefix 'graph'.
isdag(gObj)
ans = logical 0
A common question to ask about a graph is what is the shortest path between two nodes. Note that in this example all the edges have length 1.
[dist,path,pred] = shortestpath(gObj,pANode,pCNode);
Color the nodes and edges of the shortest path
set(gObj.Nodes(path),'Color',[1 0.4 0.4]) edges = getedgesbynodeid(gObj,get(gObj.Nodes(path),'ID')); set(edges,'LineColor',[1 0 0]) set(edges,'LineWidth',1.5)
You can use allshortestpaths
to calculate the shortest paths from each node to all other nodes.
allShortest = allshortestpaths(gObj);
A heatmap of these distances shows some interesting patterns.
imagesc(allShortest)
colormap(pink);
colorbar
title('All Shortest Paths for Oscillator Model');
Another common problem with graphs is finding an efficient way to traverse a graph by moving between adjacent nodes. The traverse
method uses a depth-first search by default but you can also choose to use a breadth-first search.
order = traverse(gObj,pANode);
The return value order
shows the order in which the nodes were traversed starting at pA. You can use this to find an alternative path from pA to pC.
alternatePath = order(1:find(order == pCNode)); set(gObj.Nodes(alternatePath),'Color',[0.4 0.4 1]) edges = getedgesbynodeid(gObj,get(gObj.Nodes(alternatePath),'ID')); set(edges,'LineColor',[0 0 1]) set(edges,'LineWidth',1.5)
The oscillator model is cyclic with pA, pB, and pC all connected. The method conncomp
finds connected components. A strongly connected component of a graph is a maximal group of nodes that are mutually reachable without violating the edge directions. You can use the conncomp
method to determine which nodes are not part of the main cycle.
[S,C] = conncomp(gObj); % Mark the nodes for each component with different color colors = flipud(jet(S)); for i = 1:numel(gObj.nodes) gObj.Nodes(i).Color = colors(C(i),:); end
You will notice that the "trash" node is a sink. Several nodes connect to this node but there is no path from "trash" to any other node.
In biological pathways it is common to find that while some reactions are essential to the survival of the behavior of the pathway, others are not. You can use the sparse graph representation of the pathway to investigate whether Reaction1 and Reaction2 in the model are essential to the survival of the oscillatory properties.
Find the nodes in which you are interested.
r1Node = find(strcmp( 'Reaction1', names)); r2Node = find(strcmp( 'Reaction2', names));
Create copies of the sparse matrix and remove all edges associated with the reactions.
gNoR1 = g; gNoR1(r1Node,:) = 0; gNoR1(:,r1Node) = 0; gNoR2 = g; gNoR2(r2Node,:) = 0; gNoR2(:,r2Node) = 0;
In the case where we remove Reaction2, there are still paths from pA to pC and back and the structure has not changed very much.
distNoR2CA = graphshortestpath(gNoR2,pCNode,pANode) distNoR2AC = graphshortestpath(gNoR2,pANode,pCNode) % Display the graph from which Reaction2 was removed. gNoR2Obj = view(biograph(gNoR2,names)); [S,C] = conncomp(gNoR2Obj); % Mark the nodes for each component with different color colors = flipud(jet(S)); for i = 1:numel(gNoR2Obj.nodes) gNoR2Obj.Nodes(i).Color = colors(C(i),:); end
distNoR2CA = 10 distNoR2AC = 14
However, in the case where we remove Reaction1, there is no longer a path from pC back to pA.
distNoR1AC = graphshortestpath(gNoR1,pANode,pCNode) distNoR1CA = graphshortestpath(gNoR1,pCNode,pANode)
distNoR1AC = 14 distNoR1CA = Inf
When you visualize the graph from which Reaction1 was removed you will see a significant change in the structure of the graph.
% Display the graph from which Reaction1 was removed. gNoR1Obj = view(biograph(gNoR1,names)); [S,C] = conncomp(gNoR1Obj); % Mark the nodes for each component with different color colors = flipud(jet(S)); for i = 1:numel(gNoR1Obj.nodes) gNoR1Obj.Nodes(i).Color = colors(C(i),:); end
[1] Elowitz, M.B, and Leibler, S., "A Synthetic Oscillatory Network of Transcriptional Regulators", Nature, 403(6767):335-8, 2000.