Unfortunately I can’t post the actual paper for a year, but hopefully the talk is going to be more interesting anyway! I used Google Docs to create my slides and you can find the deck here.
Twitter: An Overview
Who doesn’t use Twitter? Who doesn’t use Twitter because they think it’s pointless?
Before I start, I want to give a really brief overview of Twitter and how it works so that what follows makes sense.
This is the main page that I see on Twitter.com. What I’ve marked as the “stream” is the tweets from the people I “follow” – they’re marked on the right. I follow about 220 people. I’ve also marked out where it shows the people who follow me. I don’t follow all of them back, but it’s important to note that this doesn’t stop them mentioning me, or retweeting me.
On this page, I can see who’s mentioned me or engaged in conversation with me recently (by which I mean, a tweet starting with @catehstn). By selecting that tweet, I can see what (if any) tweet of mine they responded to.
Here I can see the tweets of mine that have been retweeted using Twitter’s relatively recent “retweet” button. Clicking on it, shows me the users who retweeted it.
Finally, these are people I’ve retweeted. Like the last screen, clicking on a tweet shows me who else retweeted it.
People who don’t use Twitter often tell me that Twitter is in fact boring people going on about their tedious lives…
… and in particular sharing with the world what they had for lunch.
It’s funny, because that’s actually why I stopped reading the Facebook news feed.
Anyway, personally I get a lot of value out of Twitter and I think it has and continues to prove it’s value as a medium – with the breaking news about the plane in the Hudson, the Iran election, and every day for businesses as a customer service medium.
It’s hard at first, though, and Twitter can seem a bit like talking to oneself in public. However, it really is what users make of it, which is why I find it particularly interesting. When I showed you the page showing my directed messages, you can see that I can ask a question and people give me sensible answers. And invite me to go skiing. That’s pretty awesome.
People Are Weird
danah boyd does amazing research into how people – particularly teens – use social networking service. She posted some examples from her recent field work on her blog, and there were a couple of techniques that teens use to manage their presence on Facebook that are really interesting. Firstly, is deleting everything. Every wall post, every message, is deleted after it’s read and responded to. Status updates are left briefly, then removed as well. Secondly, is deactivating the account every logout, so that interactions can only take place when the teen is online to manage them.
These behaviours are extreme, but to me illustrate why studying people’s behavior in the micro is interesting. At one point, if you sampled an “average” Twitter user, they would have no followers and have never tweeted anything. People use Twitter in really different ways, and the purpose of what we’re doing is to try and capture some patterns that we can pick out to characterize types of users.
Who uses a social networking service – any – in a way that is “weird”? I, for example, read my whole stream. One of my friends tells me I’m completely mad to do that, and I think she get’s frustrated because she’ll start telling me something and I’ll say, “oh I saw that tweet”.
We created this diagram to capture the different levels of interaction users have with one another. At the centre, the direct message, is the most intimate and private form of communication on Twitter. We can’t measure these. Then there are two kinds of active engagement – commenting on content, or conversing (messages that start with an @) and retweeting – when a user shares something they’ve seen with all their followers too. Then there is “listening” – this is reading the tweet, maybe clicking on the link, but not commenting. And finally we have ignoring.
We all move between the different types of interaction types, and our interaction patterns with different users will likely be very different. I definitely converse with people I never retweet, for example! Spammers, though, are always at the outside – they are interested only in pushing their content, not consuming that of other people.
So, why do we use visualization? Because we don’t really know what’s going on, exactly, and it’s helpful in looking for patterns.
It’s been shown on Twitter as well as on Facebook that whilst the “declared” set of friends produces a dense graph, the subset of people the user actually interacts with is much smaller and produces a much sparser overall graph. Whilst the number of people we are “friends” with can continue to increase, the number that are interacted with plateaus.
So we started by graphing user’s conversation networks, which is all very well if it’s fairly small like my friend Jen’s…
But get’s harder as it get’s bigger…
And eventually all we can really say is that they have a really huge network…
… and it’s really very densely connected…
I want to call out this last one, because this is a person, tweeting on behalf and as part of an organization – and she’s produced this crazily connected graph. I see it in my stream, because I see her engaging with people I know, and with myself, but this graph really shows how far reaching it is. I think Twitter’s ability to allow an entity, an organization, to build a community this way is actually quite unique, and really different from Facebook fan pages, for example.
Too Many LINES! What’s Going On?
So it’s hard to draw conclusions once the graphs get of any complexity. It’s really easy to pick out spammers, because they have a lot of out messages and no incoming messages, or just no interaction at all, but beyond that it’s really a question of light/moderate/heavy user characterization.
However the networks have a lot of singly connected nodes, and what we really want to see is the most densely connected core of the graph. We do this using clique finding.
I’m not going to go into the algorithm here, it’s very standard. We use a small optimization to remove nodes with fewer connections than our minimum clique size -1, and that’s it. It’s coded in Haskell, which is fast enough and has enough optimizations that going beyond that actually slowed the running time.
With Jen, from the cliques and the conversation graph, we can see that she only talks to a few people who also talk to each other. It suggests to me that she mostly uses Twitter to talk to people she knows.
This is me, before I changed my Twitter handle. These images are just snapshots – I know my network now would look very different than it did when I created this, over a year ago. My network is bigger, and I can pull out two key communities that I was involved in, my friends, and the tech community in Ottawa.
Kelly is super-connected, especially within Ottawa. And you see this in her graph, I think, that she’s strongly connected to other people who are very connected. She’s a local influencer, so if you want a message to spread in Ottawa she’s someone who’s very capable of making that happen.
I love this graph. I see all these disjoint communities relevant to women in technology, that this account is connected to.
In these graphs we can really see the strength and connectivity of the @anitaborg_org network. Why is this important? Because it’s an organization, not an individual. Facebook would have you make a fan page (weird), or “friend” a brand (creepy), but on Twitter the brand can just be part of the conversation. Especially for something like @anitaborg_org, which is about connecting women in tech to each other and driving those opportunities, this is really something that is much harder to do on Facebook or via blogs, if it’s possible at all. Something important to consider, is that the person who manages the account could change, but as long as the new person continued in a similar vein, the community would continue.
There’s this idea that to be influential, you need a lot of followers. And I really think we’ve moved past that, and most people now know that’s completely meaningless. Influence is about a user’s ability to get people to act. Klout tries to capture that with a number.
They have all these metrics, and people you’re influenced by (and an influencer of), and the topics you’re influential on, but what does the number really mean? I’m less influential than Clay Shirky but more influential than some of my friends?
I think this misses some context. There’s people who you can influence to say, go for dinner (an action in the offline world), and there’s people you can influence to start a conversation, then there are people you can influence to retweet your content. These are all different, and very likely they are around different topics as well.
Let’s talk about Mommy bloggers. Hugely influential – amongst each other. But are they influential to non-mommy bloggers? Are they influential on non-mommy-blogger topics? Can we capture the more nuanced aspects of influence when we just use numbers?
Future of the News
This is some work that I did with a friend working in Communications – she’s doing a discourse analysis on the future of the news and collected a dataset from Twitter – two months of tweets from a number of users who were deemed influential in this debate. Here I’ve really been chopping up the data in different ways to see if I can help her draw some conclusions from it.
This graph is just a summary of how many tweets and of what kind there are from each user in the dataset.
The key is as follows:
- Is directed at someone by starting with an @
- Contains a mention (@) of someone else
- Contains a link
This is my favorite visualization, because you can see the rhythms of someone’s day. Pale grey tweets are tweets that don’t fall into either of the above three categories, so typical “me-forming” tweets will be grey, as will short opinions. You expect to see some grey tweets, but in particular the user below has a lot:
We also see very few mentions of other users, suggesting that they are not as interactive.
With these graphs, you can see the gap which is nighttime for that user (and so when they are asleep) – but Dave Winer’s is my absolute favourite, because you can see that he pretty much doesn’t sleep!
Wordles are not statistically accurate, however I think in the context of this – where really, we’re just looking for things to look for – they’re helpful. We can pick out key topics like “google”, “ipad”, in the one above and below:
We can also see hints of certain behaviors, looking at the blow wordle you can see that the guy below probably retweets people who mention him a lot!
And this guy tweets the same website a lot.
I used some visualizations from Many Eyes for different ways of exploring the text:
For example, we can see what phases follow a certain word, like “news”.
And the other visualization shows the relationship between words.
Exploring a Conference Hashtag
Again, we use a wordle to get a sense of what is being discussed. Eclipe and ESE are the big ones, and whilst we might expect a lot of retweets given the size of “RT” this is skewed by the announcement of a product called “Eclipse RT”. On the left we can also see some influential users in this community – @IanSkerrett for example.
By graphing the frequency of users tweeting X number of times, we can see that the majority of users who participated in tweeting about the conference tweeted just once (with the hashtag), thus a minority of users tweeting up to 26 times with the hashtag are likely to be the ones driving any conversation around the hashtag.
Next, we look at client usage (counted once per user per client, so users with a lot of tweets do not skew the distribution, but use of multiple clients is counted). Despite the open source nature of Eclipse we see a significant number of users on Blackberry, iPhone, and iPad. The web Twitter client is most popular, by some margin.
Here we look at how many clients users used. Note, the users who tweeted only once will of course cause a spike for one, but it was interesting to see that some users use up to five clients.
We can see here that users of the #ese hashtag came from all over the world – the conference was held in Germany.
However the vast majority have their language set to English.
Creating a wordle of user’s bios gives us a sense of how they describe themselves – Software, Java, Developer, and Eclipse stand out.
I find this one of the most interesting graphs, because it shows that the peek for this group of users joining Twitter (not representative of all users, as more technically savvy, lots of programmers/developers etc) happened in early 2009. This is not at all like the trend graph we can get for users searching “twitter” on Google.
Finally, I created networks of the mentions between users – this time there are just two colours, directed (starts with an @) and within, for example a retweet, or a “Great talk by @user on …”.
Lots of the users in the dataset are not connected to this network, but we do see a densely connected core. It seems likely that these people are the ones tweeting more, and are really driving the conversation around the conference.
- People use Twitter in a myriad of different ways.
- Visualization allows us to explore patterns and characterize usage.
- Clique finding extracts the densely connected network that matters.
- How can we use visualization to explore communities on Twitter?