Exploring a Conference Hashtag

My supervisor had the idea of grabbing a conference dataset by hashtag, specifically the Eclipse Conference 2010 (hashtag #ese) which took place in Ludwigsburg, Germany, November 2nd to November 4th.

You can get an idea of what people were talking about in the wordle, below (applet is here):

ESE All Tweets

Apparently there were a lot of RT’s. We’ll explore that later…

I started off with HTML files that he had grabbed for me, and extracted all the tweet ID’s (regular expressions ftw) and then downloading all the information for each tweet from the API (rate-limiting is the new compiling). Finally I had a spreadsheet with a total of 640 tweets (only one couldn’t be retrieved) from 181 different users.

One user has a total of 26 tweets in the dataset, however the majority just tweeted the hashtag one time. The frequency distribution is shown in the chart, below.

tweet count frequency

The web and Tweetdeck were by far the most popular clients, as per the chart below. Of course, this can be skewed by users posting more.

Twitter Clients

To reduce this, I eliminated duplicates of user/source combinations to create the chart below:

Client Usage (User Duplicates Removed)

TweetDeck now seems slightly less popular! It’s interesting giving the tech-savvy of the users – Eclipse is an IDE, amongst other things, and is also Open Source that the web is so prevalent, and Android less so. Although Twitdroid and Twitter for Android are there they are both dominated by Twitter for iPhone.

Just 38 of the 181 users use multiple clients, although one user uses 5 (!)

Client Usage (User Duplicates Removed)

Below is a heat map of the locations of the users for the tweets in the dataset. The conference took place in Europe, so many of the participants were from that area but we also see users from North America.

[iframe: src=”http://www.openheatmap.com/embed.html?map=PheromonesMotherboardNightstick” frameborder=”0″ width=”600″ height=”450″ scrolling=”yes”]

Only 8 tweets (out of the 640 tweet dataset), 1.25% had geo-location data, and just 75 or 11.7% were replies. 55 of user accounts (out of 181), or 30.4% are geo enabled.

I filtered the dataset to keep just one tweet per user (the last one they posted with the conference hashtag).

The location heatmap with the reduced dataset:

[iframe: src=”http://www.openheatmap.com/embed.html?map=HypercriticallyThesaurussStruts” frameborder=”0″ width=”600″ height=”450″ scrolling=”yes”]

Despite the worldwide locations, the vast majority of users have their language set to English:


How do people at the Eclipse Conference describe themselves? Wordles have limitations in terms of statistical significance, but I find them useful for picking out specific themes. The wordle for user’s bios is below (applet here), “Eclipse”, “software”, “Java” and “Developer” feature prominently.

Bio Wordle

The earliest user joined in December 2006, but some joined relatively recently – in the chart below, we see a spike around February/March 2009 (this makes sense, given the astounding growth of Twitter at that time).

Joined Since

Personally, I use my favorites to collect things I mean to read. So I had a look at how these users were favoriting too. Users had between 0 and 2366 favorites. A median of 43.9, median of 3, and mode of 0 suggest that many of these users don’t use favourites at all. Standard Deviation was obviously large – 204.23.

I graphed follower/following with size proportional to number of lists using Many Eyes.



Finally – URLs. I was surprised that 54 (29.8%) of users did not have a URL in their profile. 3, shockingly, have a Facebook URL (one of which does not have the vanity URL). Blogspot (22 users) is more popular than WordPress (5 users).

Next I’ll be looking at temporal rhythms and mapping @ mentions.

13 thoughts on “Exploring a Conference Hashtag

  1. Good work. When you work on @mentions, be sure to track the number of distinct mentioners. That is the best metric for influence. What I do with the Twitter API is record the source (user_id of tweet author) and the target (user_id of the user @mentioned in the tweet) for every tweet using a specific tag. Then I can easily use SQL to get counts. Number of unique mentioners is much less easily gamed than just straight mentions.

    1. Thanks Adam! Definitely a good plan. What I’ll do is create network graphs – much harder to game your place in the network!

  2. This is cool. I would be really interested to see some similar examination of #chi2011 since I’ve gained a lot by watching that tag. It’s amazing how much more easily I can figure out the culture of the community thanks to Twitter.

    1. It’s a plan. I’ve been collecting data through the search API and yes… there will be #CHI2011 analysis coming soon! Thanks for the suggestion!

Leave a Reply