Graph above : Les Misérables
By Tamer Khraisha
Data visualization is one of the most important steps in analyzing data and presenting it to decision makers or users. As author, data journalist and information designer David McCandless said in his TED talk: "By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you're lost in information, an information map is kind of useful.".
In this post I will briefly discuss a simple example of visualization of network data. Network data is produced from many of the real-world phenomena such as the internet (webpages connected via hyperlinks), economic and financial networks (eg. transactions, holdings, alliances), communication networks (computers connected via communication links, employees communicating inside an organization), social networks (for example, people connected by friendship links), and biological networks (for example, protein interaction networks).
In visualizing network data, several dimensions can be taken in consideration. Generally speaking, a simple network is a graph made of nodes and links and therefore the simplest representation that we can think of is a planar graph, which mean a graph that can be drawn in such a way that no edges cross each other.
Herman and others, in a survey of graph visualization (pdf here), proposes a fundamental set of issues:
"The basic graph drawing problem can be put simply: given a set of nodes with a set of edges (relations), calculate the position of the nodes and the curve to be drawn for each edge. Of course, this problem has always existed, for the simple reason that a graph is often defined by its drawing. Indeed, Euler himself relied on a drawing to solve the “Königsberger Brückenproblem” in his 1736 paper" (Marshall 2000).
However, it should be noticed that there is difference between graph drawing and informative graph drawing. Some visualizations are fancy yet they don’t convey the most important information that the reader might be interested in. A good visualization would require the programmer to first identify the most important properties that a network has and make sure they are clear in the visualization.To show a practical example, I will visualize the Zachary's karate club network which is a well-known social network of a university karate club described in the paper "An Information Flow Model for Conflict and Fission in Small Groups".The Karate Club network is known for having a modular and heterogeneous structure and therefore a good visualization of this network should reflect these features
A simple visualization of the Karate-club would look like the following:
It could be noticed that this visualization does not convery much information about the structure of the network, since it does not illustrate any differences in clusters or degrees. For this reason, I next plot the same network by coloring nodes according to the community they belong to. Now our visualization results more informative. However, we still cannot tell which nodes are more important than others. To resolve this problem, I adjusted the node size to become proportional to the degree of each node. At this point our visualization does reflect both the modularity and the heterogeneity of the Karate Club. Since our network has now two communities , it would be more informative to set the length of the links in a way that links within a community are shorter than links between the two communities.