(CS/EE 144) The ideas behind the web: Detecting Communities

One of the topics that we have touched upon in our analysis of networks is clustering. Although we’ve studied ways of generating graphs that have certain properties similar to networks we see in the real world, we haven’t really discussed how to identify clusters in real world networks. A recent paper by Lovro Subelj and Marko Bajec from the University of Ljubljana in Slovenia proposes a novel propagation based algorithm to accomplish this task.

They are interested in identifying two different types of communities: link-density communities and link-pattern communities. The former is characterized by dense interconnections among nodes within the community, and is essentially the same as the notion of highly clustered nodes that we have been studying in class. The latter, however, represents common patterns of connectedness among nodes within the community. That is, all the nodes in a link-pattern community tend to be connected to the same set of nodes (outside of their community). Here’s a helpful figure taken from their paper:

The shaded polygons represent the communities; (a) shows link-density communities while (b) demonstrates link-pattern communities.

The classical algorithm which the authors improve upon is as follows: Give each node a unique label, c_n. At each iteration, update each node’s label with the label shared by most of its neighbors, and break ties randomly. Also, to prevent labels from oscillating, do not update a node if it has a label that is one of the m most popular labels. Under the assumption that there are many intra-community edges relative to inter-community edges, nodes in a community will form a consensus on a label after a certain number of iterations.

One of the problems the authors address is the order with which to update the nodes. Always updating in a fixed order favors labels from nodes which are updated first. Randomizing the order each time, however, compromises the robustness of the algorithm and the stability of the identified communities. The compromise the authors reach involves fixing a random update ordering of the nodes, but then assigning weights to each node with higher weights given to nodes which are updated later. Thus far, this algorithm is designed only to detect link-density communities. To achieve the authors’ goal of detecting both types of communities with this algorithm, they modify the update procedure to include not just neighbors of nodes but distance two nodes (neighbors of neighbors) as well, weighed by an additional delta factor between 0 and 1. How such a modification aids in the detection of link-pattern communities is left as an exercise for the reader.

The authors proceed to test their algorithm on a number of synthetic and real-world networks, including a network representing social interactions for members of a karate club and another network representing class dependencies in a piece of software. Trying to design a general purpose community finding algorithm irrespective of context is an ambitious task. Individual network may have a more specific unique structure (i.e. bipartite-ness) that can be taken advantage of when designing such algorithms.

Sources:

http://www.sciencedaily.com/releases/2012/01/120130093921.htm

http://www.springerlink.com/content/8u75321700858524/fulltext.pdf