Accidentally in Code

Cliques

Cate

December 8, 2009

If you’re not a computer scientist, probably you think of cliques like in the movie Mean Girls. A bunch of people who talk the same and look the same and act the same…

To computer scientists (well, not all of them – I’m more inclined to think about Mean Girls to be honest) cliques are a part of graph theory. Essentially, it’s a group of nodes within a graph that are all connected to each other (Wikipedia).

So what? Big who cares, right? Graph theory is super boring!

I’m not big on the maths side of computer science, been there, got the t-shirt, and now I want to make stuff! But recently I’ve been working on applying graph theory to the Twitter Conversation Networks I make in order to produce my graphs. Why? Because this will allow me to pull out the sub-networks that you’re a part of, and the cliques that you’re connected to. Maybe if you know this you’ll find other people you’re potentially interested in following or (even better!) talking to.

For instance, here are my cliques of size 4 (the maximum size of the cliques found) within my 2 degree network:

[1,2,10,24] – tgrevatt, kittenthebad (me), childspeter, isfalk

[1,2,20,21] – tgrevatt, kittenthebad (me), mazamengr, krusk

[1,2,21,24] – tgrevatt, kittenthebad (me), krusk, isfalk

[1,21,24,74] – tgrevatt, krusk, isfalk, erinblaskie

[1,21,24,85] – tgrevatt, krusk, isfalk, julien

[1,20,21,74] – tgrevatt, mazamengr, krusk, erinblaskie

[2,21,24,36] – kittenthebad (me), krusk, isfalk, foursquare

There are more for size 3, but you get the picture.

This is a step towards making simpler graphs, where I will graph only the cliques in someone’s network – this will make it easier to see your important connections, I think. For now, if you want one, you can have a mapping and the result of clique finding for a minimum size of 3 or more.

How it’s done

I use the GraphML generated by my Twitter datamining library, that can also be used to produce visualizations.
A Python script generates a mapping and a list of connections
I implemented clique finding using Haskell – this means it runs really quickly and I don’t have to worry about memory because of lazy evaluation.
Minor optimization in the Haskell code, but significant due to the sparse nature of the graphs: I only try and find connections for nodes that have the minimum number of connections. Should be able to further optimize so that the same cliques aren’t found multiple times.