My supervisor had the idea of grabbing a conference dataset by hashtag, specifically the Eclipse Conference 2010 (hashtag #ese) which took place in Ludwigsburg, Germany, November 2nd to November 4th.
You can get an idea of what people were talking about in the wordle, below (applet is here):
Apparently there were a lot of RT’s. We’ll explore that later…
I started off with HTML files that he had grabbed for me, and extracted all the tweet ID’s (regular expressions ftw) and then downloading all the information for each tweet from the API (rate-limiting is the new compiling). Finally I had a spreadsheet with a total of 640 tweets (only one couldn’t be retrieved) from 181 different users.
One user has a total of 26 tweets in the dataset, however the majority just tweeted the hashtag one time. The frequency distribution is shown in the chart, below.
The web and Tweetdeck were by far the most popular clients, as per the chart below. Of course, this can be skewed by users posting more.
To reduce this, I eliminated duplicates of user/source combinations to create the chart below:
TweetDeck now seems slightly less popular! It’s interesting giving the tech-savvy of the users – Eclipse is an IDE, amongst other things, and is also Open Source that the web is so prevalent, and Android less so. Although Twitdroid and Twitter for Android are there they are both dominated by Twitter for iPhone.
Just 38 of the 181 users use multiple clients, although one user uses 5 (!)
Below is a heat map of the locations of the users for the tweets in the dataset. The conference took place in Europe, so many of the participants were from that area but we also see users from North America.
[iframe: src=”http://www.openheatmap.com/embed.html?map=PheromonesMotherboardNightstick” frameborder=”0″ width=”600″ height=”450″ scrolling=”yes”]
Only 8 tweets (out of the 640 tweet dataset), 1.25% had geo-location data, and just 75 or 11.7% were replies. 55 of user accounts (out of 181), or 30.4% are geo enabled.
I filtered the dataset to keep just one tweet per user (the last one they posted with the conference hashtag).
The location heatmap with the reduced dataset:
[iframe: src=”http://www.openheatmap.com/embed.html?map=HypercriticallyThesaurussStruts” frameborder=”0″ width=”600″ height=”450″ scrolling=”yes”]
Despite the worldwide locations, the vast majority of users have their language set to English:
How do people at the Eclipse Conference describe themselves? Wordles have limitations in terms of statistical significance, but I find them useful for picking out specific themes. The wordle for user’s bios is below (applet here), “Eclipse”, “software”, “Java” and “Developer” feature prominently.
The earliest user joined in December 2006, but some joined relatively recently – in the chart below, we see a spike around February/March 2009 (this makes sense, given the astounding growth of Twitter at that time).
Personally, I use my favorites to collect things I mean to read. So I had a look at how these users were favoriting too. Users had between 0 and 2366 favorites. A median of 43.9, median of 3, and mode of 0 suggest that many of these users don’t use favourites at all. Standard Deviation was obviously large – 204.23.
I graphed follower/following with size proportional to number of lists using Many Eyes.
Finally – URLs. I was surprised that 54 (29.8%) of users did not have a URL in their profile. 3, shockingly, have a Facebook URL (one of which does not have the vanity URL). Blogspot (22 users) is more popular than WordPress (5 users).
Next I’ll be looking at temporal rhythms and mapping @ mentions.
13 replies on “Exploring a Conference Hashtag”
[…] Very interesting post by my master’s student Cate Huston on Exploring a Conference Hashtag. […]
[WORDPRESS HASHCASH] The comment’s server IP (126.96.36.199) doesn’t match the comment’s URL host IP (188.8.131.52) and so is spam.
Good work. When you work on @mentions, be sure to track the number of distinct mentioners. That is the best metric for influence. What I do with the Twitter API is record the source (user_id of tweet author) and the target (user_id of the user @mentioned in the tweet) for every tweet using a specific tag. Then I can easily use SQL to get counts. Number of unique mentioners is much less easily gamed than just straight mentions.
Thanks Adam! Definitely a good plan. What I’ll do is create network graphs – much harder to game your place in the network!
This is cool. I would be really interested to see some similar examination of #chi2011 since I’ve gained a lot by watching that tag. It’s amazing how much more easily I can figure out the culture of the community thanks to Twitter.
It’s a plan. I’ve been collecting data through the search API and yes… there will be #CHI2011 analysis coming soon! Thanks for the suggestion!
[…] This post was mentioned on Twitter by Ian Skerrett, Mike Milinkovich and Kim Moir, Cate Huston. Cate Huston said: My latest stuff on exploring a conference hashtag (specifically #ese) is up – http://dld.bz/78qh supervisor's take here – http://dld.bz/78qp […]
[WORDPRESS HASHCASH] The comment’s server IP (184.108.40.206) doesn’t match the comment’s URL host IP (220.127.116.11) and so is spam.
Thanks for doing this analysis. Interesting to see the results from the conference. Not sure if you saw this previous analysis from a past EclipseCon. http://konigsberg.blogspot.com/2009/03/twitter-at-eclipsecon.html
No! But good to have 🙂 thanks. Looks like people tweeted more last year. I wonder if he still has the data. Thanks!
btw, we have a collection of technology called Eclipse RT so some of those RT might not be retweet.
Good to know, I’ll need to filter that 🙂 thanks!
[…] Usage at ESE Cate Huston has done an analysis of the Eclipse Summit Europe twitter hashtag (#ese).Â Â It is interesting to see the visualization of what people were talking about at ESE.Â I […]
[WORDPRESS HASHCASH] The comment’s server IP (18.104.22.168) doesn’t match the comment’s URL host IP (22.214.171.124) and so is spam.