Tag: Visualization

  • More Experiments Around RGB Averaging

    More Experiments Around RGB Averaging

    I decided to replicate the showing/hiding the dominant hues in images with showing/hiding around the average RGB values instead.

    I created a class called RGBColor (just holds red, green, and blue values), similar to the one I created called HSBColor. I could have used the java.awt.Color class, but that insists on a range of 0-1 for the values, and I wanted to avoid conversions between that and the 0-255 that Processing uses by default. Or, better, allow me to set the colorMode to be in range 1-100, so that my tolerances were percentages – I found this made it easier to pick good values there.

    Images Showing Colors Around “Average” RGB

    Images Hiding Colors Around “Average” RGB

    Source Code

    package ui;
    
    import model.RGBColor;
    import color.ColorHelper;
    import processing.core.PApplet;
    import processing.core.PImage;
    
    @SuppressWarnings("serial")
    public class AverageRGBImageViewApplet extends PApplet {
    
    	PImage img;
    	static final int rgbTolerance = 50;  // Adjust this.
    
    	public void setup() {
    		size(640,480);
    		background(0);
    		img = loadImage(/* Your image file here */);
    		colorMode(RGB, 100);
    		processImage();
    	}
    
    	public void draw() {
    		image(img, 0, 0, 640, 480);
    	}
    
    	private void processImage() {
    		RGBColor color = ColorHelper.rgbColorFromImage(img, this);
    
    		for (int i = 0; i < img.pixels.length; i++) {
    			int pixel = img.pixels[i];
    			RGBColor pxColor =
    				new RGBColor(red(pixel), green(pixel), blue(pixel));
    			// Adjust this conditional to show/hide around average rgb.
    			if (!rgbInRange(color, pxColor, rgbTolerance)) {
    				float brightness = brightness(pixel);
    				img.pixels[i] = color(brightness);
    			}
    		}
    	}
    
    	private boolean rgbInRange(RGBColor colorA, RGBColor colorB, int tolerance) {
    		return Math.abs(colorA.r - colorB.r) < tolerance &&
    			Math.abs(colorA.g - colorB.g) < tolerance &&
    			Math.abs(colorA.b - colorB.b) < tolerance;
    	}
    }

    ColorHelper.java

    package color;
    
    import processing.core.PApplet;
    import processing.core.PImage;
    import model.RGBColor;
    
    public class ColorHelper {
    
    	public static RGBColor rgbColorFromImage(PImage img, PApplet applet) {
    		img.loadPixels();
    		int numberOfPixels = img.pixels.length;
    		float totalRed = 0f;
    		float totalGreen = 0f;
    		float totalBlue = 0f;
    
    		for (int i = 0; i < numberOfPixels; i++) {
    			int pixel = img.pixels[i];
    			totalRed += applet.red(pixel);
    			totalGreen += applet.green(pixel);
    			totalBlue += applet.blue(pixel);
    		}
    
    		// Calculate final rgb values.
    		float r = totalRed / numberOfPixels;
    		float g = totalGreen / numberOfPixels;
    		float b = totalBlue / numberOfPixels;
    		return new RGBColor(r, g, b);
    	}
    }
  • Eliminating the Dominant Hue from an Image

    Eliminating the Dominant Hue from an Image

    I thought it would be interesting to invert the idea of showing only the dominant hue, and show everything but that instead. I used the exact same code, but inverted the if statement so:

    if (!hueInRange(hue, lower, upper))

    became

    if (hueInRange(hue, lower, upper))

    Effect is as follows, as with most of these, my favourite effect is on the painting – does it work better because it is a more studied use of color? I’m going to make something that will compare and contrast the effects, and allow me to loop through pictures so I can look for interesting results.

  • Showing Only the Dominant Hue In an Image

    Showing Only the Dominant Hue In an Image

    Having extracted the dominant hue from the images, we can manipulate the image such that pixels that are not (or close to) the dominant hue are instead made grayscale.

    I converted to grayscale using the brightness of the image in the HSB. This worked really nicely.

    From my earlier experiments I decided on a hue range of 320 (320 buckets).

    I varied the tolerance (from 1 – 20) of how far away from the dominant hue we would show on the same four images, with varying results of aesthetic pleasingness. How dominant the dominant color in the image is really varies the effect. One of them, pretty much didn’t work at all until I hit a tolerance of about 55 – at which point, 1/3 of the spectrum.

    import processing.core.PApplet;
    import processing.core.PImage;
    
    @SuppressWarnings("serial")
    public class DominantHueImageViewApplet extends PApplet {
    
    	PImage img;
    	static final int hueRange = 320; 
    	static final int hueTolerance = 10;  // Adjust this.
    
    	public void setup() {
    		size(640,480);
    		background(0);
    		img = loadImage("" /* Your image goes here */);
    		colorMode(HSB, (hueRange - 1));
    		processImage();
    	}
    
    	public void draw() {
    		image(img, 0, 0, 640, 480);
    	}
    
    	private void processImage() {
    		img.loadPixels();
    		int numberOfPixels = img.pixels.length;
    		int[] hues = new int[hueRange];
    		float[] saturations = new float[hueRange];
    		float[] brightnesses = new float[hueRange];
    
    		for (int i = 0; i < numberOfPixels; i++) {
    			int pixel = img.pixels[i];
    			int hue = Math.round(hue(pixel));
    			float saturation = saturation(pixel);
    			float brightness = brightness(pixel);
    			hues[hue]++;
    			saturations[hue] += saturation;
    			brightnesses[hue] += brightness;
    		}
    
    		// Find the most common hue.
    		int hueCount = hues[0];
    		int dominantHue = 0;
    		for (int i = 1; i < hues.length; i++) {
     			if (hues[i] > hueCount) {
    				hueCount = hues[i];
    				dominantHue = i;
    			}
    		}
    
    		// Manipulate photo, grayscale any pixel that isn't close to that hue.
    		int lower = dominantHue - hueTolerance;
    		int upper = dominantHue + hueTolerance;
    		print("dominentHue" + dominantHue);
    		for (int i = 0; i < numberOfPixels; i++) {
    			int pixel = img.pixels[i];
    			float hue = hue(pixel);
    			if (!hueInRange(hue, lower, upper)) {
    				float brightness = brightness(pixel);
    				img.pixels[i] = color(brightness);
    			}
    		}
    	}
    
    	private static boolean hueInRange(float hue, int lower, int upper) {
    	        // Should compensate for it being circular here - can go around.
                    return hue < upper && hue > lower;
    	}
    }
  • Exploring a Conference Hashtag: Part 3

    Exploring a Conference Hashtag: Part 3

    I wanted to explore temporal rhythms around the conference, for example to pick out more popular or particularly tweet-able sessions.

    Temporal Rhythms for #ESE
    Temporal Rhythms for #ESE

    Color scheme:

    1. Is directed at someone by starting with an @
    2. Contains a mention (@) of someone else
    3. Contains a link

    Disappointingly, no patterns are immediately obvious here – it seems like tweeting was reasonably consistent throughout. I removed directed points to see if that made a difference.

    Temporal Rhythms for #ESE - no directed
    Temporal Rhythms for #ESE – no directed

    Not really! Although now there are some small gaps – session breaks, perhaps?

    Code-wise, I reused this.

  • Part 8: Who’s Talking About The Future of Newspapers?

    Part 8: Who’s Talking About The Future of Newspapers?

    I’m working on a paper on topical communities, and as part of that I’ve come back to this dataset to explore the social network that emerges through @ mentions.

    To start with, I looked at the social network that emerges when we look at the people on the list.

    Future of the News Network
    Future of the News Network

    This network is pretty densely connected, with the exception of two users on the list. You can see their nodes floating away in the image below:

    Future of the News Network - Outliers
    Future of the News Network – Outliers

    The network graph that emerges from all the tweets connected is really busy, but may show who the most engaged users are.

    Future of the News Network - Full
    Future of the News Network – Full

    There’s just too much information here, so I started filtering it by eliminating nodes that had fewer than a specified minimum number of connections. Because of the dataset available, non-news-influencer nodes cannot be connected to each other. Thus, I was specifying how many influencers needed to mention a user for them to make it into the graph.

    Future of the News Network - Minimum 2
    Future of the News Network – Minimum 2

    Setting the minimum to two dramatically reduces the size of the graph. Many of the nodes remaining are also well known, for example @jack and @alyssa_milano.

    Future of the News Network - Minimum 3
    Future of the News Network – Minimum 3

    We can also see popular websites, like @techcrunch and @boingboing as well as @google (not surprising given how often google showed up in the earlier visualizations of tweet content.

    Future of the News Network - Minimum 4
    Future of the News Network – Minimum 4
    Future of the News Network - Minimum 5
    Future of the News Network – Minimum 5
    Future of the News Network - Minimum 6
    Future of the News Network – Minimum 6
    Future of the News Network - Minimum 7
    Future of the News Network – Minimum 7
    Future of the News Network - Minimum 8
    Future of the News Network – Minimum 8
    Future of the News Network - Minimum 9
    Future of the News Network – Minimum 9
    Future of the News Network - Minimum 10
    Future of the News Network – Minimum 10

    I find the graphs for minimum 8+ fascinating – I think they start to show who influences the influencers.

    Future of the News Network - Minimum 11
    Future of the News Network – Minimum 11

    Eventually, of course, we get back to our original graph.

     

  • Sunflower Layout in Processing

    Sunflower Layout in Processing

    For a while, I’ve been wanting to make something that explores color in photo sets. Reading Beautiful Visualization (Amazon), I came across the perfect way to arrange the elements. It imitates the layout of the sunflower seeds, “the most efficient and visually mesmerizing way of packing small elements into a large circle”.

    I decided to try the layout in Processing and to keep it simple, by following an easy rule for creating color. Start with black (the absence of color) and add red, then when red was saturated add blue, and finally green until we reach white. Because there is no yellow in the pattern, I used yellow for the background.

    I’m not entirely happy with the very centre (I think this is a consequence of having to convert between double and float) but it’s pretty cool:

    Using a different strategy for changing color (essentially generating every even numbered RGB value) with smaller radius and spacing, I made the image below.

    I love this layout!

    I use Java in Eclipse with the core.jar library. You can use this code in the Processing Editor with some small modifications.

    Code

    import processing.core.PApplet;
    
    
    public class SunflowerSeeds extends PApplet {
    	
    	private static final int radius = 10;
    	private static final int scale = 7;
    	
    	private static final double goldenangle = Math.PI * (3 - Math.sqrt(5));
    	
    	private static final int wh = 400;
    	
    	public void setup() {
    		size(wh, wh);
    		background(255, 255, 0);
    		noLoop();
    	}
    
    	public void draw() {
    		int r = 0;
    		int g = 0;
    		int b = 0;
    		
    		int n = 0;
    		
    		double a = 0;
    	
    		while (g < 255) {
    			if (r < 255) {
    				r++;
    			}
    			else if (b < 255) {
    				b++;
    			}
    			else {
    				g++;
    			}
    			
    			double h = Math.sqrt(n)*scale;
    			double x = wh/2 + Math.sin(a) * h;
    			double y = wh/2 + Math.cos(a) * h;
    			
    			stroke(0);
    			fill(r, g, b);
    			ellipse((float) x, (float) y, radius, radius);
    			
    			a+=goldenangle;
    			n++;
    		}
    	}
    }
    

    Code for Animated Version

    import processing.core.PApplet;
    
    public class SunflowerAnimated extends PApplet{
    
    	private static final int radius = 10;
    	private static final int scale = 7;
    
    	private static final double goldenangle = Math.PI * (3 - Math.sqrt(5));
    
    	private static final int wh = 400;
    
    	private int r = 0;
    	private int g = 0;
    	private int b = 0;
    
    	private int n = 0;
    	private double a = 0;
    
    	public void setup() {
    		size(wh, wh);
    		background(255, 255, 0);
    	}
    
    	public void draw() {
    		if (g >= 255) {
    			noLoop();
    			return;
    		}
    		else if (r < 255) {
    			r++;
    		}
    		else if (b < 255) {
    			b++;
    		}
    		else {
    			g++;
    		}
    
    		double h = Math.sqrt(n)*scale;
    		double x = wh/2 + Math.sin(a) * h;
    		double y = wh/2 + Math.cos(a) * h;
    		a+=goldenangle;
    		stroke(0);
    		fill(r, g, b);
    		ellipse((float) x, (float) y, radius, radius);
    
    		n++;
    	}
    }
  • Following the Conversation: A More Meaningful Measure of Engagement

    Following the Conversation: A More Meaningful Measure of Engagement

    Unfortunately I can’t post the actual paper for a year, but hopefully the talk is going to be more interesting anyway! I used Google Docs to create my slides and you can find the deck here.

    Twitter: An Overview

    Who doesn’t use Twitter? Who doesn’t use Twitter because they think it’s pointless?

    Before I start, I want to give a really brief overview of Twitter and how it works so that what follows makes sense.

    This is the main page that I see on Twitter.com. What I’ve marked as the “stream” is the tweets from the people I “follow” – they’re marked on the right. I follow about 220 people. I’ve also marked out where it shows the people who follow me. I don’t follow all of them back, but it’s important to note that this doesn’t stop them mentioning me, or retweeting me.

    On this page, I can see who’s mentioned me or engaged in conversation with me recently (by which I mean, a tweet starting with @catehstn). By selecting that tweet, I can see what (if any) tweet of mine they responded to.

    Here I can see the tweets of mine that have been retweeted using Twitter’s relatively recent “retweet” button. Clicking on it, shows me the users who retweeted it.

    Finally, these are people I’ve retweeted. Like the last screen, clicking on a tweet shows me who else retweeted it.

    Credit: Geek and Poke

    People who don’t use Twitter often tell me that Twitter is in fact boring people going on about their tedious lives…

    Credit: Geek and Poke

    … and in particular sharing with the world what they had for lunch.

    It’s funny, because that’s actually why I stopped reading the Facebook news feed.

    Anyway, personally I get a lot of value out of Twitter and I think it has and continues to prove it’s value as a medium – with the breaking news about the plane in the Hudson, the Iran election, and every day for businesses as a customer service medium.

    Credit: Geek and Poke

    It’s hard at first, though, and Twitter can seem a bit like talking to oneself in public. However, it really is what users make of it, which is why I find it particularly interesting. When I showed you the page showing my directed messages, you can see that I can ask a question and people give me sensible answers. And invite me to go skiing. That’s pretty awesome.

    People Are Weird

    Credit: thisischris.com

    danah boyd does amazing research into how people – particularly teens – use social networking service. She posted some examples from her recent field work on her blog, and there were a couple of techniques that teens use to manage their presence on Facebook that are really interesting. Firstly, is deleting everything. Every wall post, every message, is deleted after it’s read and responded to. Status updates are left briefly, then removed as well. Secondly, is deactivating the account every logout, so that interactions can only take place when the teen is online to manage them.

    These behaviours are extreme, but to me illustrate why studying people’s behavior in the micro is interesting. At one point, if you sampled an “average” Twitter user, they would have no followers and have never tweeted anything. People use Twitter in really different ways, and the purpose of what we’re doing is to try and capture some patterns that we can pick out to characterize types of users.

    Who uses a social networking service – any – in a way that is “weird”? I, for example, read my whole stream. One of my friends tells me I’m completely mad to do that, and I think she get’s frustrated because she’ll start telling me something and I’ll say, “oh I saw that tweet”.

    Credit: Michael Weiss

    We created this diagram to capture the different levels of interaction users have with one another. At the centre, the direct message, is the most intimate and private form of communication on Twitter. We can’t measure these. Then there are two kinds of active engagement – commenting on content, or conversing (messages that start with an @) and retweeting – when a user shares something they’ve seen with all their followers too. Then there is “listening” – this is reading the tweet, maybe clicking on the link, but not commenting. And finally we have ignoring.

    We all move between the different types of interaction types, and our interaction patterns with different users will likely be very different. I definitely converse with people I never retweet, for example! Spammers, though, are always at the outside – they are interested only in pushing their content, not consuming that of other people.

    Using Visualization

    Credit: geograph.org.uk

    So, why do we use visualization? Because we don’t really know what’s going on, exactly, and it’s helpful in looking for patterns.

    It’s been shown on Twitter as well as on Facebook that whilst the “declared” set of friends produces a dense graph, the subset of people the user actually interacts with is much smaller and produces a much sparser overall graph. Whilst the number of people we are “friends” with can continue to increase, the number that are interacted with plateaus.

    So we started by graphing user’s conversation networks, which is all very well if it’s fairly small like my friend Jen’s…

    @jliyi

    But get’s harder as it get’s bigger…

    @kittenthebad

    And eventually all we can really say is that they have a really huge network…

    @krusk

    … and it’s really very densely connected…

    @anitaborg_org

    I want to call out this last one, because this is a person, tweeting on behalf and as part of an organization – and she’s produced this crazily connected graph. I see it in my stream, because I see her engaging with people I know, and with myself, but this graph really shows how far reaching it is. I think Twitter’s ability to allow an entity, an organization, to build a community this way is actually quite unique, and really different from Facebook fan pages, for example.

    Too Many LINES! What’s Going On?

    So it’s hard to draw conclusions once the graphs get of any complexity. It’s really easy to pick out spammers, because they have a lot of out messages and no incoming messages, or just no interaction at all, but beyond that it’s really a question of light/moderate/heavy user characterization.

    However the networks have a lot of singly connected nodes, and what we really want to see is the most densely connected core of the graph. We do this using clique finding.

    I’m not going to go into the algorithm here, it’s very standard. We use a small optimization to remove nodes with fewer connections than our minimum clique size -1, and that’s it. It’s coded in Haskell, which is fast enough and has enough optimizations that going beyond that actually slowed the running time.

    @jliyi cliques size 3+
    @jliyi cliques size 4+

    With Jen, from the cliques and the conversation graph, we can see that she only talks to a few people who also talk to each other. It suggests to me that she mostly uses Twitter to talk to people she knows.

    @kittenthebad cliques size 3+
    @kittenthebad cliques size 4+
    @kittenthebad cliques size 5+

    This is me, before I changed my Twitter handle. These images are just snapshots – I know my network now would look very different than it did when I created this, over a year ago. My network is bigger, and I can pull out two key communities that I was involved in, my friends, and the tech community in Ottawa.

    @krusk cliques size 3+
    @krusk cliques size 4+
    @krusk cliques size 5+
    @krusk cliques size 6+

    Kelly is super-connected, especially within Ottawa. And you see this in her graph, I think, that she’s strongly connected to other people who are very connected. She’s a local influencer, so if you want a message to spread in Ottawa she’s someone who’s very capable of making that happen.

    @anitaborg_org cliques size 3+

    I love this graph. I see all these disjoint communities relevant to women in technology, that this account is connected to.

    @anitaborg_org cliques size 4+
    @anitaborg_org cliques size 5+
    @anitaborg_org cliques size 6+
    @anitaborg_org cliques size 7+

    In these graphs we can really see the strength and connectivity of the @anitaborg_org network. Why is this important? Because it’s an organization, not an individual. Facebook would have you make a fan page (weird), or “friend” a brand (creepy), but on Twitter the brand can just be part of the conversation. Especially for something like @anitaborg_org, which is about connecting women in tech to each other and driving those opportunities, this is really something that is much harder to do on Facebook or via blogs, if it’s possible at all. Something important to consider, is that the person who manages the account could change, but as long as the new person continued in a similar vein, the community would continue.

    Influence

    Credit: Geek and Poke

    There’s this idea that to be influential, you need a lot of followers. And I really think we’ve moved past that, and most people now know that’s completely meaningless. Influence is about a user’s ability to get people to act. Klout tries to capture that with a number.

    They have all these metrics, and people you’re influenced by (and an influencer of), and the topics you’re influential on, but what does the number really mean? I’m less influential than Clay Shirky but more influential than some of my friends?

    I think this misses some context. There’s people who you can influence to say, go for dinner (an action in the offline world), and there’s people you can influence to start a conversation, then there are people you can influence to retweet your content. These are all different, and very likely they are around different topics as well.

    Let’s talk about Mommy bloggers. Hugely influential – amongst each other. But are they influential to non-mommy bloggers? Are they influential on non-mommy-blogger topics? Can we capture the more nuanced aspects of influence when we just use numbers?

    What’s Next?

    Future of the News

    This is some work that I did with a friend working in Communications – she’s doing a discourse analysis on the future of the news and collected a dataset from Twitter – two months of tweets from a number of users who were deemed influential in this debate. Here I’ve really been chopping up the data in different ways to see if I can help her draw some conclusions from it.

    This graph is just a summary of how many tweets and of what kind there are from each user in the dataset.

    The key is as follows:

    1. Is directed at someone by starting with an @
    2. Contains a mention (@) of someone else
    3. Contains a link

    This is my favorite visualization, because you can see the rhythms of someone’s day. Pale grey tweets are tweets that don’t fall into either of the above three categories, so typical “me-forming” tweets will be grey, as will short opinions. You expect to see some grey tweets, but in particular the user below has a lot:

    We also see very few mentions of other users, suggesting that they are not as interactive.

    With these graphs, you can see the gap which is nighttime for that user (and so when they are asleep) – but Dave Winer’s is my absolute favourite, because you can see that he pretty much doesn’t sleep!

    Wordles are not statistically accurate, however I think in the context of this – where really, we’re just looking for things to look for – they’re helpful. We can pick out key topics like “google”, “ipad”, in the one above and below:

    We can also see hints of certain behaviors, looking at the blow wordle you can see that the guy below probably retweets people who mention him a lot!

    And this guy tweets the same website a lot.

    I used some visualizations from Many Eyes for different ways of exploring the text:

    For example, we can see what phases follow a certain word, like “news”.

    And the other visualization shows the relationship between words.

    Exploring a Conference Hashtag

    Again, we use a wordle to get a sense of what is being discussed. Eclipe and ESE are the big ones, and whilst we might expect a lot of retweets given the size of “RT” this is skewed by the announcement of a product called “Eclipse RT”. On the left we can also see some influential users in this community – @IanSkerrett for example.

    By graphing the frequency of users tweeting X number of times, we can see that the majority of users who participated in tweeting about the conference tweeted just once (with the hashtag), thus a minority of users tweeting up to 26 times with the hashtag are likely to be the ones driving any conversation around the hashtag.

    Next, we look at client usage (counted once per user per client, so users with a lot of tweets do not skew the distribution, but use of multiple clients is counted). Despite the open source nature of Eclipse we see a significant number of users on Blackberry, iPhone, and iPad. The web Twitter client is most popular, by some margin.

    Here we look at how many clients users used. Note, the users who tweeted only once will of course cause a spike for one, but it was interesting to see that some users use up to five clients.

    We can see here that users of the #ese hashtag came from all over the world – the conference was held in Germany.

    However the vast majority have their language set to English.

    Creating a wordle of user’s bios gives us a sense of how they describe themselves – Software, Java, Developer, and Eclipse stand out.

    I find this one of the most interesting graphs, because it shows that the peek for this group of users joining Twitter (not representative of all users, as more technically savvy, lots of programmers/developers etc) happened in early 2009. This is not at all like the trend graph we can get for users searching “twitter” on Google.

    Finally, I created networks of the mentions between users – this time there are just two colours, directed (starts with an @) and within, for example a retweet, or a “Great talk by @user on …”.

    Lots of the users in the dataset are not connected to this network, but we do see a densely connected core. It seems likely that these people are the ones tweeting more, and are really driving the conversation around the conference.

    Summary

    • People use Twitter in a myriad of different ways.
    • Visualization allows us to explore patterns and characterize usage.
    • Clique finding extracts the densely connected network that matters.
    • How can we use visualization to explore communities on Twitter?
  • Exploring a Conference Hashtag: Part 2

    Exploring a Conference Hashtag: Part 2

    Continued on from this post. Also see my supervisor’s comments.

    Interestingly, it turns out there is an Eclipse product called “Eclipse RT”, so perhaps there weren’t as many retweets as the wordle might have suggested…

    Still – 413 of the 640 tweets in the data set contained an @ mention of some kind. Of these, 72 were “directed” (by which I mean, start with an @).

    I created network graphs using Prefuse. There are two kinds of links – included in an directed tweet (yellow), and included in a non-directed tweet (purple).

    This split is because to me there seems a clear distinction between a conversational directed tweet, and a mention – at a conference, if someone’s speaking a tweet might consist of <quote> + <credit> – this is sharing the person’s ideas with the audience of your twitter stream, but will not normally be denoted by retweet notation.

    As yet, I haven’t created and uploaded an applet – for now, you can see the screenshots below. Some observations:

    • There’s a dense graph, which the vast majority of users are not connected to.
    • There are a few two-person connections separate from the graph.
    • The bulk of the activity takes place in this core.

    I think it’ll be interesting to apply some clique finding to this. I’m also interesting in applying this distinction (directed vs. mention) to temporal rhythms – thinking along the lines of what I created here.

  • Exploring a Conference Hashtag

    Exploring a Conference Hashtag

    My supervisor had the idea of grabbing a conference dataset by hashtag, specifically the Eclipse Conference 2010 (hashtag #ese) which took place in Ludwigsburg, Germany, November 2nd to November 4th.

    You can get an idea of what people were talking about in the wordle, below (applet is here):

    ESE All Tweets

    Apparently there were a lot of RT’s. We’ll explore that later…

    I started off with HTML files that he had grabbed for me, and extracted all the tweet ID’s (regular expressions ftw) and then downloading all the information for each tweet from the API (rate-limiting is the new compiling). Finally I had a spreadsheet with a total of 640 tweets (only one couldn’t be retrieved) from 181 different users.

    One user has a total of 26 tweets in the dataset, however the majority just tweeted the hashtag one time. The frequency distribution is shown in the chart, below.

    tweet count frequency

    The web and Tweetdeck were by far the most popular clients, as per the chart below. Of course, this can be skewed by users posting more.

    Twitter Clients

    To reduce this, I eliminated duplicates of user/source combinations to create the chart below:

    Client Usage (User Duplicates Removed)

    TweetDeck now seems slightly less popular! It’s interesting giving the tech-savvy of the users – Eclipse is an IDE, amongst other things, and is also Open Source that the web is so prevalent, and Android less so. Although Twitdroid and Twitter for Android are there they are both dominated by Twitter for iPhone.

    Just 38 of the 181 users use multiple clients, although one user uses 5 (!)

    Client Usage (User Duplicates Removed)

    Below is a heat map of the locations of the users for the tweets in the dataset. The conference took place in Europe, so many of the participants were from that area but we also see users from North America.

    [iframe: src=”http://www.openheatmap.com/embed.html?map=PheromonesMotherboardNightstick” frameborder=”0″ width=”600″ height=”450″ scrolling=”yes”]

    Only 8 tweets (out of the 640 tweet dataset), 1.25% had geo-location data, and just 75 or 11.7% were replies. 55 of user accounts (out of 181), or 30.4% are geo enabled.

    I filtered the dataset to keep just one tweet per user (the last one they posted with the conference hashtag).

    The location heatmap with the reduced dataset:

    [iframe: src=”http://www.openheatmap.com/embed.html?map=HypercriticallyThesaurussStruts” frameborder=”0″ width=”600″ height=”450″ scrolling=”yes”]

    Despite the worldwide locations, the vast majority of users have their language set to English:

    Languages

    How do people at the Eclipse Conference describe themselves? Wordles have limitations in terms of statistical significance, but I find them useful for picking out specific themes. The wordle for user’s bios is below (applet here), “Eclipse”, “software”, “Java” and “Developer” feature prominently.

    Bio Wordle

    The earliest user joined in December 2006, but some joined relatively recently – in the chart below, we see a spike around February/March 2009 (this makes sense, given the astounding growth of Twitter at that time).

    Joined Since

    Personally, I use my favorites to collect things I mean to read. So I had a look at how these users were favoriting too. Users had between 0 and 2366 favorites. A median of 43.9, median of 3, and mode of 0 suggest that many of these users don’t use favourites at all. Standard Deviation was obviously large – 204.23.

    I graphed follower/following with size proportional to number of lists using Many Eyes.

    24e29d64-f34a-11df-a448-000255111976

    Blog_this_caption

    Finally – URLs. I was surprised that 54 (29.8%) of users did not have a URL in their profile. 3, shockingly, have a Facebook URL (one of which does not have the vanity URL). Blogspot (22 users) is more popular than WordPress (5 users).

    Next I’ll be looking at temporal rhythms and mapping @ mentions.