In which we answer the question – what are they saying?
I’ve split the tweets up into two types – at replies, and not at replies, and a third which contains all tweets. I’ve created wordles of each one, for each of the 20 people we were following.
If you haven’t – check out wordle.net. It’s awesome.
There’s debate as to whether wordles are good ways to analyze text – definitely there are better ways (possibly to be explored in a future post) however I think they’re cool and here they have some utility. Note, though, that sizes of word are relative to the number of words in the data set for that individual, which are of varying size (see Part 1, Part 2, Part 3).
I don’t want to tread on Caitlin’s analysis (I’m just the data junkie), but some things you can see, aside from topics of discussion:
- People who make a point of thanking others (most likely for retweets or similar)
- People who retweet things that others have said about them
- Where RT is conspicuous by it’s absence
- Specific websites that get tweeted a lot
My personal favorite is Dave Winer’s all tweets! Let me know what you think.
Programming-wise, the code is trivial because wordle accepts free text. But, before I realized that the guy who wrote wordle was much smarter than me, I tried to be clever an optimize it by using a LinkedHashSet. I chose this data structure on the basis that – I wanted O(1) random access (the hash) because I would find the same words repeated, only one instance of each word (the set) and a nice quick iteration (the linked) so I could output a key, value table at the end. And then I discovered that there was no get() or elementAt() method – and stopped trying to be a smart-alec!
4 replies on “Part 4: Who’s Talking About The Future Of Newspapers?”
[…] This post was mentioned on Twitter by Kelly Rusk and Caitlin Kealey, Caitlin Kealey. Caitlin Kealey said: Great blog full of my data as pretty viz by @kittenthebad: http://bit.ly/bDdkoJ cc/ @mathewi @davewiner @hermida @codybrown @yelvington […]
[WORDPRESS HASHCASH] The comment’s server IP (188.8.131.52) doesn’t match the comment’s URL host IP (184.108.40.206) and so is spam.
Yeah, I would totally just HashMap with a conditional creating new Integer entries if the key isn’t in there yet, and then just iterate over entrySet() at the end. You’ll be putting data into the hash much more frequently than iterating through it, anyway. =)
Also, did you manually copy and paste, POST, or do some other funky hack for generating multiple Wordles?
You are so right, I just wanted to use a LinkedHashSet. One day! In the end free text worked best and that’s what I needed for this week’s thing too.
I C+P’d which was a pain. POST would have been better but I didn’t want to save stuff to a gallery because it can’t be deleted. Making public data more public is a privacy issue I’m conscious of :-s
[…] on from Part 4, exploring what they are saying using Word Trees on Many […]