She has a dataset of 12 million tweets containing the word “science” – about a years worth of data, after filtering fout non-English tweets and spam.
Using UTC for fewer timezone problems. Although still some – mostly things expecting the month first cause date-related problems.
Found more tweets about science mid-week than at weekends – this matches wider patterns of Twitter use in other research.
- describe() – summary of the object.
- groupby() – reorganize your data-structure to group by some attribute.
- Exports to Latex.
IP[y] : Notebook
- Really cool – make notes about what you are doing, interleaved with code.
- Great for research.
- ? – inline help.
- ?? – inline src.
- %%timeit – times execution, useful for neasuring performance.
- %pastebin – sends code to pastebin
- %save – makes a .py file
- %run – run a script
- Data structures.
- Data analysis.
- Time-based indexing.
I’m pretty fascinated with the results of this research, which we didn’t see much of as the talk was about the technical setup. I feel like this would have been incredibly handy doing my own research though, and it was good to chat to Brenda at our women’s breakfast and compare notes on other tools like processing, prefuse etc.