Complexity Crumblenaut — 2

Following is another overview of the most interesting papers, links, talks, events, etc. in the network science, SNA, and complex systems research, which I have come across last two weeks.

Dynamic Communities Identification

In the present time, we analyze the dynamic communities using so-called offline approach, which means that we identify communities in each time-slice of the analyzed network and then we track these communities by matching them using some similarity measure, e.g. Jaccard coefficient. One drawback of this approach is that the notion of community identity of is not very clear — consider the case when the community is continuously influenced by other communities or newcomers until the point, where there will be nobody from the members from the first time slice. The question if, whether this community is still the same or not? Should we refer to it by the same name, or should we rather introduce a new name? For instance, in case of family the name should persist, whereas in case of a music band it should be changed.

This question actually first appeared in the ancient Greece as Ship of Theseus paradox, as it was remarked by Tanya-Berger Wolf et al. in their recent publication, in which they review several existing strategies for dynamic communities identification. On top of that, they define the problem more rigorously, which enabled them to come up with a solution which considers all time slices at once. That is to say, communities identified by their algorithm are stable across time slices. What I like on that algorithm is its grounding in observation of real-world communities: as the best community structure is considered that one, which minimize certain costs of agents involved. They assume that our motivation to be a part of the community is a result of interplay of the costs of switching community, visiting community, and absenting in the community. For example, if it is very costly to switch a community while the cost to visit a foreign community is lower and an agent is from time to time observed within another community, it will still be a part of its home community because community switch is simply too expensive, while in the opposite case it will just switch its community affiliation.

However, that algorithm is not the only one which takes into account possibly all time-slices at once: Mucha et al. generalized modularity to the form, which allows detection of communities in time-dependent networks as well.

Efficient Computation of Group Betweenness in JUNG

Certain groups in a social network may be more important than others. In the world of science, the usual way how to determine the success or impact of a researcher is citation analysis, namely using of impact factor. The success of a research group or domain can be then assessed by the impact factor of publications supported by the grant, or number of patents, etc. The problem of IF is that it doesn’t take time seriously, i.e. it’s static, and that sometimes the impact cannot be identified by counting citations, but by a position in the network. Quite silly, but still useful, example may be: How many people cite Newton or Leibnitz today? I think that not too much, but their work on calculus is simply essential to really a wast majority of sciences. These connecting elements in a network are usually revealed by inspecting their betweenness centrality. But in case of a group of nodes, we need a generalization of this measure. It has been actually proposed by Everett and Borgatti. The single-node betweenness in weighted graphs with n nodes and m edges can be computed efficiently in O(nm+n2logn) time by algorithm proposed by Brandes. He also published a modification which computes a group betweenness in the same time. The problem is that this modification needs this time for every group, so in case one needs to compute group betweenness for several groups, it is better to use another algorithm, which after O(n3) pre-processing steps computes a betweenness of a group with k members in O(k3). Unfortunately, there is not any public implementation of this algorithm in Java, so last week, I have been working on one for JUNG framework. For now, it seems to me that the code is reasonably stable, but it still needs a little bit of testing. Some optimizations are also possible, but sticking with the rule that premature optimization is the source of large portion of bugs, I postponed it until it will be necessary. I will offer it to the JUNG maintainers, so hopefully it will become a part of some future version. For the time being, if you’re interested in it, drop me an e-mail (I cannot publish it here as WordPress doesn’t allow uploading ZIP files — I’ll try to find some solution for that).

2 Comments

  1. Posted October 31, 2010 at 11:40 am | Permalink | Reply

    “How many people cite Newton or Leibnitz today? I think that not too much, but their work on calculus is simply essential to really a wast majority of sciences.”

    Calculus is essential. But maybe not Newton’s or Leibnitz’. They have far less influence, in part because they are far less often read. When they are read, it is for history of science, not for learning calculus. Except at places like St. John’s, where it may be for both.

    You might think about the difference between *context-setting* and influence.

  2. Posted October 31, 2010 at 8:04 pm | Permalink | Reply

    Thanks for that comment, Jodi! As I wrote, it’s a silly example – the message was that even a scientist with few citations may be very important. And actually, the revolutionary work may become very quickly a part of standard textbooks and then there is no point to cite the original work for a lot of people. This may be the case of the calculus itself. If I understand correctly, the context-setting may be exactly this – instead of direct influence, you just make a lot of people think about the same thing in a similar way, is it?

Leave a comment