One of the topics that we have touched upon in
our analysis of networks is clustering. Although we’ve studied ways of
generating graphs that have certain properties similar to networks we see in
the real world, we haven’t really discussed how to identify clusters in real
world networks. A recent paper by Lovro Subelj and Marko Bajec from the
University of Ljubljana in Slovenia proposes a novel propagation based algorithm
to accomplish this task.
They are interested in identifying two
different types of communities: link-density communities and link-pattern
communities. The former is characterized by dense interconnections among nodes
within the community, and is essentially the same as the notion of highly
clustered nodes that we have been studying in class. The latter, however, represents
common patterns of connectedness among nodes within the community. That is, all
the nodes in a link-pattern community tend to be connected to the same set of
nodes (outside of their community). Here’s a helpful figure taken from their
paper:
The shaded polygons represent the communities;
(a) shows link-density communities while (b) demonstrates link-pattern
communities.
The classical algorithm which the authors
improve upon is as follows: Give each node a unique label, cn. At
each iteration, update each node’s label with the label shared by most of its
neighbors, and break ties randomly. Also, to prevent labels from oscillating,
do not update a node if it has a label that is one of the m most popular
labels. Under the assumption that there are many intra-community edges relative
to inter-community edges, nodes in a community will form a consensus on a label
after a certain number of iterations.
One of the problems the authors address is the
order with which to update the nodes. Always updating in a fixed order favors
labels from nodes which are updated first. Randomizing the order each time,
however, compromises the robustness of the algorithm and the stability of the identified
communities. The compromise the authors reach involves fixing a random update
ordering of the nodes, but then assigning weights to each node with higher
weights given to nodes which are updated later. Thus far, this algorithm is
designed only to detect link-density communities. To achieve the authors’ goal
of detecting both types of communities with this algorithm, they modify the
update procedure to include not just neighbors of nodes but distance two nodes
(neighbors of neighbors) as well, weighed by an additional delta factor between
0 and 1. How such a modification aids in the detection of link-pattern
communities is left as an exercise for the reader.
The authors proceed to test their algorithm on
a number of synthetic and real-world networks, including a network representing
social interactions for members of a karate club and another network
representing class dependencies in a piece of software. Trying to design a
general purpose community finding algorithm irrespective of context is an
ambitious task. Individual network may have a more specific unique structure (i.e. bipartite-ness) that can be taken advantage of when designing such algorithms.
Sources:
No comments:
Post a Comment