I moved this blog to a dedicated hosting: http://belak.net/blog.
Group Betweenness Centrality in JUNG (Java)
As a part of my work on cross-community analysis, I needed to compute centrality of whole clusters. Particularly, I was interested in their betweenness centrality as defined by Borgatti & Everett. The computation of this score by classic algorithm by Brandes can be quite expensive in case of many groups in the network, but Puzis et al. proposed a faster alternative, which cleverly precomputes certain data. Unfortunately, a reference implementation of this algorithm in Python does not work with weighted graphs and actually didn’t fit into my analytical toolkit either. Therefore I decided to implement it in Java on top of JUNG. You can find the patch here. It works with JUNG 2.0.1, but likely also with 2.0.0.
Setting-up LaTeX Environment on Mac OS X
I recently switched from Linux to Mac OS X and one of the first things I needed to set up was LaTeX environment. In Ubuntu, I used to use gEdit with its LaTeX plugin, TeX Live distribution and Gnome PDF viewer, so the minimal requirements for my new environment were:
- syntax highlighting
- intelligent (i.e. ignoring LaTeX commands) spell-checking
- code completion and snippets
- integrated build system (i.e. building a PDF by a keyboard shortcut)
- PDF viewer auto-reloading the file after each recompile
- free
First thing I installed was a distribution of TeX Live for Mac — MacTeX. The installation was pretty straightforward so there’s no point to elaborate about it. This distribution directly contains editor called TeXShop. I cannot say that I don’t like it but it is rather simple, so the next one I tried was TeXMaker, which I had already tried couple of years ago and hadn’t used because of (for me) unintuitive user interface. I had hoped the new version is better. It is in terms of user friendliness but the document preview (compilation and opening of a PDF file) is unbelievably slow. Then I started to think about buying some editor, because it seemed there’s no good free one. So I tried Latexian. I opened a file which compiled in TeXMaker in it and it didn’t get compiled. Another unusable editor. Finally, I overcome my many-years long resistance to Emacs (I had always been a Vim user:-)) and installed Aquamacs.
After opening the first TeX file (a Beamer presentation), I was really impressed by the quality of syntax highlighting. For instance, sections have larger fonts, italics is really italics, bold is really bold. Auto-indentation and document re/formatting also works pretty well. Aquamacs is nice because it supports modern keyboard shortcuts, so you don’t have to press ctrl+foo, ctrl+foo, ctrl+bar, ctrl+foo to open a file:-). Instead, cmd+o, as it is usual on Mac, works. Aquamacs comes with Emacs package for LaTeX editing — AUCTeX. It very powerful and besides aforementioned code formatting, highlighting and compiling it also supports various macros for insertion of code snippets and in-line preview of figures, e.g. mathematical formulae. This feature is indeed very useful, but unfortunately it doesn’t work for me, so if you know how to fix it, I would be really grateful for any hint. The last thing I needed to change was a PDF viewer as the standard Preview doesn’t really work very well with files which are being recompiled (it crashes from time to time and it takes it long time to reload the file). I tried classic Acrobat Reader but it doesn’t work either. What really works very smoothly is Skim. Aquamacs provides a very good integration with this PDF viewer. The only two things I had to change was in Finder I had to change the association of PDF files to be opened in Skim by default and then in Skim’s preferences in Sync section I ticked Check for file changes and then I chose Aquamacs as a Preset. If you then include package preview in your document and recompile, you can then directly jump from the point where your cursor in Aquamacs is to the place in the compiled PDF by choosing menu Command->Jump to PDF. It works even the other way around! In Skim, try to press cmd+shift+left mouse click on any place and you will jump directly into the source!
With this combination, I have even better working environment than gEdit + Gnome PDF viewer, even though this combination was really good as well. The Aquamacs+Skim is better as it provides PDF Sync (jumping from/to source/PDF) and Skim provides higher quality of fonts (but this is probably rather a feature of Mac than solely Skim).
Winter School of Network Theory and Applications
At the very beginning of this year, from 5th to 8th January, I had attended the Winter School of Network Theory and Applications at the University of Warwick, UK. It was organized by complexity research centres from universities of Oxford and Warwick and in contrast to other workshops on networks, this school was less focused on social networks in particular, and was more concerned with networks in general, their statistics, modelling and dynamics.
The majority of the programme consisted of blocks of typically two 1.5h lectures on broader topics like ‘network statistics’ or ‘dynamics of neural networks’. Besides these major blocks, there were couple of 45min short talks on concrete topics like community detection or biologically inspired network dynamics. There were also three tutorial sessions, were attenders worked in small teams on some elementary network analysis tasks in Matlab, like generation of Erdos-Renyi graphs, investigation of its criticality, generation of small-world network and inspection of its diameter, etc.
As network science is not completely new for me, certain lectures were rather repetition. However, it was indeed useful to have all this previous knowledge to be framed in the unified perspective, which can be offered only by experts in the field. As I understood, the school should be organized again next year, so I would recommend it to anybody interested in incorporation of network science/analysis into his/her own research toolkit — especially at the beginning of the research.
Complexity Crumblenaut — 2
Following is another overview of the most interesting papers, links, talks, events, etc. in the network science, SNA, and complex systems research, which I have come across last two weeks.
Dynamic Communities Identification
In the present time, we analyze the dynamic communities using so-called offline approach, which means that we identify communities in each time-slice of the analyzed network and then we track these communities by matching them using some similarity measure, e.g. Jaccard coefficient. One drawback of this approach is that the notion of community identity of is not very clear — consider the case when the community is continuously influenced by other communities or newcomers until the point, where there will be nobody from the members from the first time slice. The question if, whether this community is still the same or not? Should we refer to it by the same name, or should we rather introduce a new name? For instance, in case of family the name should persist, whereas in case of a music band it should be changed.
This question actually first appeared in the ancient Greece as Ship of Theseus paradox, as it was remarked by Tanya-Berger Wolf et al. in their recent publication, in which they review several existing strategies for dynamic communities identification. On top of that, they define the problem more rigorously, which enabled them to come up with a solution which considers all time slices at once. That is to say, communities identified by their algorithm are stable across time slices. What I like on that algorithm is its grounding in observation of real-world communities: as the best community structure is considered that one, which minimize certain costs of agents involved. They assume that our motivation to be a part of the community is a result of interplay of the costs of switching community, visiting community, and absenting in the community. For example, if it is very costly to switch a community while the cost to visit a foreign community is lower and an agent is from time to time observed within another community, it will still be a part of its home community because community switch is simply too expensive, while in the opposite case it will just switch its community affiliation.
However, that algorithm is not the only one which takes into account possibly all time-slices at once: Mucha et al. generalized modularity to the form, which allows detection of communities in time-dependent networks as well.
Efficient Computation of Group Betweenness in JUNG
Certain groups in a social network may be more important than others. In the world of science, the usual way how to determine the success or impact of a researcher is citation analysis, namely using of impact factor. The success of a research group or domain can be then assessed by the impact factor of publications supported by the grant, or number of patents, etc. The problem of IF is that it doesn’t take time seriously, i.e. it’s static, and that sometimes the impact cannot be identified by counting citations, but by a position in the network. Quite silly, but still useful, example may be: How many people cite Newton or Leibnitz today? I think that not too much, but their work on calculus is simply essential to really a wast majority of sciences. These connecting elements in a network are usually revealed by inspecting their betweenness centrality. But in case of a group of nodes, we need a generalization of this measure. It has been actually proposed by Everett and Borgatti. The single-node betweenness in weighted graphs with n
nodes and m
edges can be computed efficiently in O(nm+n2logn)
time by algorithm proposed by Brandes. He also published a modification which computes a group betweenness in the same time. The problem is that this modification needs this time for every group, so in case one needs to compute group betweenness for several groups, it is better to use another algorithm, which after O(n3)
pre-processing steps computes a betweenness of a group with k
members in O(k3)
. Unfortunately, there is not any public implementation of this algorithm in Java, so last week, I have been working on one for JUNG framework. For now, it seems to me that the code is reasonably stable, but it still needs a little bit of testing. Some optimizations are also possible, but sticking with the rule that premature optimization is the source of large portion of bugs, I postponed it until it will be necessary. I will offer it to the JUNG maintainers, so hopefully it will become a part of some future version. For the time being, if you’re interested in it, drop me an e-mail (I cannot publish it here as WordPress doesn’t allow uploading ZIP files — I’ll try to find some solution for that).
Complexity Crumblenaut — 1
Even though I tweet the most interesting information during a week, I decided to summarize the most important papers, ideas, talks, etc. I’ve come across during the week. Hence I started this crumblenaut series more-or-less regularly (hopefully weekly) exploring recent fragments of complexity. It may be particularly interesting for people, who do not use Twitter and who want to keep an eye what happens in the domain.
Link Communities: a Paradigm Shift in Community Detection?
Community detection is one of the hot topics in network science. In many networks, and particularly in social ones, the real-world communities are overlapping. That is to say, usually a person is a member of more than one community, e.g. family, company, school, sport club, etc. Node partitioning methods enforcing each person to be a part of exactly one community are thus providing quite a constrained picture of the community structure, and thus several algorithms for detection of overlapping communities have been proposed. One of the first was a modification of classic Girvan-Newman algorithm, which itself is based on an idea that suitable splitting point are edges with very high betweenness. The modification of GN algorithm for detection of overlapping communities is based on the notion of split betweenness, which is a measure of betweenness of a hypothetical edge connecting a two parts of a split node. If this value is high, the original node is probably a part of two communities and it is better to include that node in both of them, i.e. to split it. In spite of this clever trick, there is IMO more elegant way how to identify overlapping communities by clustering edges instead of nodes. Whereas a node may be a member of many different communities, it is less likely the case if it comes to an edge. For instance, I may be a part of my family, working group, etc., but it is unlikely, that the relations connecting me to my family are the same as the links relating me to my colleagues. Lehman et al. published an algorithm relying on this idea, where they defined a similarity measure between edges, which can then be clustered. Evans and Lambiotte recently described another way: first they transformed the original graph into a link graph, in which the nodes represent the edges of the original graph, while the edges in the link graph represented the adjacency of the edges of the original graph. What is brilliant on this approach is that you can use any non-overlapping community detection algorithm and by a reverse transformation you obtain the final overlapping communities of people!
Hierarchical Version of Infomap Community Detection Method
One of the non-overlapping community detection algorithm you may consider is Infomap by Rosvall and Bergstrom, that was evaluated as the best from a set of currently popular methods. They came up this week with an extension of Infomap which reveals also a hierarchical structure of communities. An interesting feature of their approach to community detection is that they model an information flow in the network and thus the resulted communities are elements which highly likely interact more mutually than with other elements outside of their community. According to my experience Infomap is able to detect communities even if they are quite small and in a very noisy data-set.
Cheating of Impact Factor
One of the motivations of our analysis of cross-community phenomena in science is the observation that the assessment of research relies too much on static citation measures like impact factor. The impact of a research community may be however very different if inspected from the dynamical point of view. Another good argument for the unsuitability of impact factor is gracefully described by Arnold and Fowler in their recent paper Nefarious Numbers, that reveals the cheating of couple of scientists in applied math, whose citations were mostly self-referential and as they were also in committees of some journals, they gained a lot of citations from papers submitted to those journals. As for me, I was thinking how the dynamic analysis of bibliographic network could help in automatic detection of unfair behaviour like that. What is the pattern? I would expect the cheating authors to be a part of a relatively disconnected community, because they are just citing each other, but nobody cites them so intensively.
Visualization is Beautiful
Visualization of complex system may be gorgeous. And what is the best, there is a gallery of such visualizations!
Is Preferential Attachment Overcome?
Currently widely accepted mechanism of emergence of complex network is preferential attachment. This paper, however, discuss that not a degree of a node, but a similarity between nodes is a key mechanism driving the linkage of new nodes to existing ones. Consider the social network of your friends: how do you create new links? I would say it is to a great extent dependent on the network structure itself (a friend of my friend may become my friend as well), but also by a similarity (a friend of my friend, who is however interested in completely different things, or has contradictory world-view, probably will not become my friend). Hence I think it is necessary to consider both: network structure and nodes’ attributes (i.e., similarity). Things become even more complicated when you consider common social phenomena like suggestibility, because then you deal with feedback loops and the whole system become intrinsically dynamic. On the other hand, such a model would be closer to the real world …
Do We Want Smart Cities or Smart People?
In Portugal, a new city is being constructed. Well, nothing special about that. But the thing is that the architects tried to design a sustainable and green city, where diverse resources like water, sewerage, or electricity are managed by a computer via many different sensors, so that the city will have in a certain sense a central brain. I like the idea, but what I really miss in projects like this one is a solution, where the factories for production of those “clean technologies” will be located and how do they fit into the sustainability paradigm. I think it is quite hypocritical to build and inhabit a green city and move all the dirty production to China or Vietnam. More than brain of cities, we need to start to use our own brain in a way it was not projected to work — we need to understand the planetary scale and accept that what happens beyond our natural horizon (e.g. in China) affects us and vice-versa.
Who Wants to be a Billionaire?
And who feels how at Twitter? And who analyzed how are these two questions related? Bollen et al. analyzed Twitter streams in 2008 and showed that an analysis of sentiment of those feeds can significantly improve accuracy of prediction of Dow Jones index three days in advance. So they didn’t relate the two questions directly, but who knows — maybe they are actually working on making money out of this great piece of work:-). What I found particularly interesting was that a general sentiment (positive/negative) was not very helpful, whereas when they fragmented a sentiment into 6 different dimensions like calmness or happiness, they found out that the former is really helpful.
WebScience: The Next Big Thing or a Buzzword?
Let me start this post with a question each graduate in computer or information science is supposed to be able to answer: What is Web? Besides a completely right but in our context useless Salomonian answer that it’s a spider’s tool to catch insect, we really need to answer this question if we want to study Web. Let’s accept just for the purpose of this post that web is a global communication space using Internet as a medium. Note that this does not directly exclude non-HTTP communication. I will get back to this at the end of the post. Well, we have it defined and now we may study it! Why? Because it has penetrated our lives to an extent where it is advisable to know more precisely:
- What new types of interactions and behaviours of people it brought?
- What is the relation between large-scale communication structures like free software movement, social networks, Wikipedia, etc. and individual motivations and actions from which these structures emerge?
- What is an economic impact of Web? Does it affect the ways we perceive/create wealth? Has it brought some new types of utility that has never occurred before?
- Are there any differences between social norms and stereotypes between off-line and on-line worlds? Are there two different notions of privacy, friendship, … between those two worlds?
- What are the proper scientific methods for studying the Web?
Connecting now 28.7% of the Earth’s population and still rapidly growing, the Web is becoming a ubiquitous part of our culture. Therefore, the study of similar questions is inevitable if we want to catch up in understanding it. The Science of the Web, or WebScience, has thus been pushed forward as an independent research stream by figures like Tim Berners-Lee, Nigel Shadbolt, or Dame Wendy Hall. At the end of September, two public events related to WebScience were organized by the Royal Society.
The first one took place at Royal Society centre in London for two days and about two hundreds of participants could be there I guess. The speakers were mostly the core figures of their respective disciplines which are understood as influential, inspiring, and fundamental elements of what WebScience is supposed to be. Namely, network science (Albert László-Barabási, Jon Kleinberg), mathematics (Jennifer Chayes, Robert May), sociology (Manuel Castells), computer science and engineering (Tim Berners-Lee, Luis von Ahn), communication science (Noshir Contractor), and law (Jonathan Zittrain). There were more speakers, as you can see in the programme, but those listed here particularly arrested my attention and somehow remained in my mind. Being a computer scientist working on graph mining techniques, I was particularly amazed by Jon Kleinberg’s presentation on the state of the art in link prediction, structural balance in graphs, and other things which I surely do not remember completely, so I am looking forward to the video streams recorded on site. Another great talk was given by Luis von Ahn. An excellent presenter with smart ideas that help world to digitize books or translate Wikipedia by employing millions of people (very likely even you!) without them being necessarily aware of it! Jennifer Chayes presented some advancements in mathematical apparatus for handling of network data – in particular a proper sampling of networks and game theoretic approaches for modelling of dynamics on social networks. Having some elementary background and being interested in political economics, I particularly enjoyed Bob May’s talk on how model’s of spread of diseases are similar to models of financial crisis. I also liked his side-comments on current political neo-libertanian doctrine and its influence on the current mess, which were only seemingly of marginal importance – in fact, they were very essential for the whole talk, I would say. I was waiting the whole two days for some presentation about the semantic web – and finally with the presentation of Tim Berners-Lee, I had lived to see it. He mainly told about the current Linked Data project and what are the bottlenecks of the present semantic web – namely it was the lack of higher level libraries/components for work with RDF data. It was nice to hear that because it means that there are RDF data out there already and now it’s time to consume them! In fact, Tim’s talk was not the first one about the semantic web – David Karger showed us an interesting way how to produce and visualize RDF data in browser using Exhibit. I really loved that talk, because it was a nice introduction into rich possibilities structured on-line data give us but without mentioning words like triple, logic, ontology, RDF, etc. And the whole platform seems to be really useful for creation of rich on-line presentation of mid-size data sets. All aforementioned speakers were presenting personally – except Jonathan Zittrain, whose speech was transmitted on-line. His presentation had a provocative title: Will the web break? He spoke about different legislative problems related to services like web archive, which operate on the edge of the law (or even illegally), because of obsolete copyright law. Quite interesting remark was also about URL shorteners like bit.ly, that simply can cause parts of the web to break, as if they stop to operate, part of the hyperlink structure will become dead. Regarding .ly domain, Tim Berners-Lee recently twitted about the potential infringements of the online human rights by the Lybian government, under which jurisdiction this domain belong, so it is really worth to think about which one to use.
The second satellite meeting was in a certain sense a continuation of the big discussion event in London. It was held in a lovely countryside house near Milton Keynes and was organized as a series of short presentations and follow-up workshops focused on several defining aspects of WebScience like network science, linked (government) data, crowdsourcing, etc. There was much more space to discuss things and people made use of it. On Wednesday evening, there was also a poster session, where I presented a one about our work on cross-community analysis. As there were only 9 posters altogether, it was a great opportunity to get a feedback. I think I may say that our work was quite well accepted there:-). All posters are accessible either as a list, or as a map. What I was missing there was a dedicated block on methodology of Web Science. At the end of this two-day event, there was a short workshop in which one group was working on methodology-related topics, but this was IMO insufficient. I think if Web Science is supposed to be a real scientific discipline and not just a label for a bunch of loosely related topics of different disciplines with a Web as a common denominator, we really do need a common language, methodology toolkit – a common paradigm. I am aware of that the whole discipline is just at its infancy and that this may be overcame in the future, but I think it is important to keep this in mind as a number one priority, because otherwise the Web Science itself may become just a buzzword and a missed opportunity.
Now I am getting back to the beginning of this post, where I postponed the question which Internet services we may consider as the Web and which ones we may not? I think it is quite unfortunate to call this endeavour Web Science without properly making a distinction between World Wide Web as a service relying on HTTP and the global communication space in a more general sense. If we constrain the WebScience just only on communication realized via HTTP, we are shooting ourselves into our own feet, because we are putting aside many other interesting parts of cyberspace: IRC, World of Warcraft, Usenet, e-mail, FTP, BitTorrent, … Without any doubts, the World Wide Web is the most important service of the Internet if it comes to communication, but it is not the only one. Things become even more complicated with some people pushing forward a term Internet Science. What are the relation of these two: Web and Internet Sciences? I have always assumed that Internet is a set of low-level protocols, wires, routers and other hardware, whose only purpose is to transmit packets from point A to point B. So in that interpretation there is no space to investigate the actual communication process between humans and an actual impact of these processes on the behaviour of people. And that is what I find the most interesting on Web Science.
Applications of Social Network Analysis: ASNA 2010
By some strange coincidence, I managed to visit Switzerland again this year – even two times in the same month!;-) I visited Zurich between 15.-17. September, where ASNA 2010 conference took place. This year’s topic was dynamics of social networks, which pretty much resonates with our current work on mutual effects of bibliographic communities, so I presented a full paper about it there (see below the slides).
The majority of talks was given by social scientists – namely sociologists and political scientists. There were couple of computer scientists as well. Particularly Tanya Berger-Wolf‘s talk arrested my attention, as she presented their work on social network analysis of zebras. One particular feature of zebra communities is that individual members visit for some period of time another communities. Therefore, they developed a community detection algorithm to detect communities stable in time, which also allows to treat an individual to be a part of its “base” community while it may be occasionally visiting another community. They introduced economically motivated notion of community affiliation, which seemed to me very interesting, as it brings to community detection methodology well argued notion of what does it mean to be a part of the community and what community itself is.
Another interesting talk I enjoyed was by Thomas Valente. He has done a lot of work on social network intervention programs, e.g. who to influence and how in order to prevent drug abuse among adolescents. I was quite surprised, that methods he presented were quite simple. It’s not a rocket science! For example, one may specify, that the desired goal is to rise cohesiveness of the network, so then s/he may try to add various links between nodes and identify the marginal growth of objectivity function, e.g. cohesiveness. I was immediately thinking of that in our work on co-citation networks, we may look at the similar process. That is to say, to inspect which scientists should be connected together in order to maximize their impact on the network, growth of their community, etc. I assume that a necessary step allowing this would be to calibrate a multi-agent model describing the behaviour of the scientific communities we have analyzed. Probably a catch is how to construct such a model: what should be the parameters? How should new link be added among the scientists? And should the existing links decay? What influence the formation of a citation link between scientists? Certainly it is a topical similarity. But what else? Spatial similarity? Position in the existing network? Anything else? I think the recent work of Leskovec and Myunghwan may be a good starting point for such a model.
One of the take away messages I brought from ASNA is that social and computer scientists have very different notion of scale. When they talk about “large-scale”, they usually mean hundred and more. When we talk about big networks, we usually mean tens of thousands or even millions. One of the reason I guess is the methodology of obtaining the studied networks. Whereas we usually scrape, mine, and integrate data in order to obtain those networks, they interview people, which is of course much more time-consuming.
The conference itself was organized by the University of Zurich and was held at their campus, which is nearby the city centre. The whole city is beautiful, clean, well-organized and pleasant to stay. The architecture and overall look is quite different to Geneva or Lausanne, though. In general, those French-speaking parts differs to German-speaking ones. This is quite surprising, I would say, as Switzerland is really a small country. As the conference ended on Friday evening, I booked my flight on Saturday afternoon. Since I had to check out at my hotel in the morning, I had some time to look around and to do some quick sightseeing. But being astonished by the beauty of the city’s architecture, I suddenly realized that I really do not have time left and that I have to run to the railway station. Without all this famous Swiss punctuality, I would have been pretty doomed, because I caught the last train to the airport and checked-in at the time, when the desk was about to close. Good luck:-).
Second International Conference on eParticipation: ePart 2010
Between 29. August and 2. September 2010, the second ePart conference held in beautiful city of Lausanne in Switzerland. eParticipation is a discipline studying engagement and participation of citizens in public policy using modern communication technologies, particularly Internet. The conference was co-located with its more famous and established conference on electronic government – eGov – from which it split off. I presented there a paper based on my Master’s thesis I elaborated on ontology driven self-organization of politically engaged social groups. See below the slides.
I found interesting particularly two presentations. The first was an announcement of new EU FP7 project called Padgets, which stands for “participation gadgets“. The goal of the project is to develop a platform, which will allow policy makers to directly communicate through well established social web sites with the citizens. I consider this approach as the only one possibility how to really engage citizens to participate, as it is highly unlikely the citizens will start to use dedicated system just for eParticipation purposes.
The second talk was given by Pietro Speroni di Fenizio with title Don’t vote – Evolve! He presented an interesting evolutionary approach for large-scale collaborative decision-making. The key characteristic of his method is that every opinion in the system is taken into account and considered, thus elegantly avoiding tyranny of the majority.
The conference was in a completely new building of Swiss Graduate School of Public Administration. The campus of the university is really well located – being perfectly accessible by metro on the one hand and being surrounded by trees, meadows with cows, and Geneva lake on the other hand. Lausanne itself is also very nice place: walks on the shores of Geneva lake, perfect public transport, friendly people, good food, … And of course: art! It seems to me Swiss people really have a good taste of art. I had this impression in Basel three years ago, and my visit in Lausanne only confirmed my impression. I have never seen so many beautiful sculptures, fountains, graphittis, and building as in Switzerland. On the occasion being in Lausanne, I went to the famous gallery of Art Brut and was really amazed. I was thinking that for such exceptional collections, the establishment should provide wheel chairs, because after two hours of walking and staring at paintings your legs really start to hurt:). I hope I will return to that place soon.
Knitting the Dublin Clique
Last two weeks of August I spent in Dublin visiting the UCD Clique group. Even though modern technologies and Internet in particular allow us to communicate freely and efficiently, the face-to-face encountering is still irreplaceable, what I realized right after the first brainstorm session on what should I work on during my two-week visit. It turned out, that there is a significant intersection between what I would like to work on in the close future and what the guys in Dublin plan to research.
My work on cross-community dynamics relies on arbitrary community detection algorithm. However, I have not used any overlapping communities detection method. Dublin group works on two of those methods (GCE and MOSES), so I applied these and assessed preliminary the quality of the result communities. It turned out, that these communities are probably topically less cohesive, than communities mined with Infomap or Louvain methods, which, however, does not necessarily imply worse performance of Dublin methods. I would rather say, that the structure of overlapping communities is simply more “open”. I’m looking forward to see, what cross-community effects we will be able to identify among these overlapping communities. I’m thinking of at least one special case not possible using non-overlapping communities: transdisciplinary, or “intermediary” communities. Those communities, which are formed by parts of other, more sharply defined communities (in terms of their topic), should themselves be identified by overlapping communities detection algorithm. Therefore, it should be easy to just look at communities, whose majority consists of other communities.
Daniel Archambauld together with Derek Greene developed very useful tool for analysis of dynamic communities: TextLUAS. It’s an application, that visualizes the dynamic life-cycle of the communities, but not only that. It also visualizes tags associated to each community, or in general, associated to sub-part of a community life-cycle. One can then very easily inspect, how topics of one community disseminate to other communities, which the community interacted with. Together with Daniel, I worked on tweaking this software for our purposes of cross-community analysis. As a result, we are now able to use it with arbitrary community detection algorithm. In future, we plan to develop a life-cycles clustering support, so that one will be able to inspect only certain type of life-cycle, e.g. “communities, which emerged from two other communities and then grew”. This will be particularly useful in case of analysis of many dynamic communities, as then the complete visualization of their life-cycles starts to be really unclear and messy.