Tag: data

  • Org Survey Part 1: Questions

    Org Survey Part 1: Questions

    Danbo_vs_Domo_vs_Minato_Mirai_(8839397350).jpg
    Credit: Wikimedia

    I remember when I managed a 6 person team, and I always felt like I had a handle on what was going on (perhaps I’m being overly nostalgic here). Now I manage a ~26 person organisation, and on a good day I feel like I have a general idea of what’s happening, and certain specific things that I’m focusing on in more depth.

    For all the effort I make to be accessible and to build a relationship with individuals on the team, the reality is that a bi-monthly 1:1 and the odd slack conversation can only go so far. This is managing at a level of indirection.

    A couple of months ago, with some input from existing practises, HR, and my colleague John, I put together an organizational health check. This took the form of two two-part surveys (using Google forms), that I’ve since refined and helped my peers use for their orgs.

    Survey 1: The Org Survey

    This survey goes out to all ICs on the team. The first set of questions focus on their overall impression of the org and org leadership.

    1. My org lead communicates division strategy and direction to me in a way that’s clear and enables me to act on it.
    2. My org lead regularly shares relevant info from the company, org, and other relevant parties.
    3. My org lead has articulated to me a clear vision for the future of the org.
    4. I agree with the vision that my lead has articulated for the future of the org.
    5. The org is aligned with the the mission of the company.
    6. The mission of the company is the right one for the future.
    7. The org is high performing.
    8. The wider organisation recognises the performance of the org.
    9. My org lead provides a space for me to use my voice, and really listens when I do.
    10. My org lead has helped me develop as an individual or leader on the team.
    11. My org lead has actively supported me with technical or people issues.
    12. Any other comments about the org?

    The second set of questions focuses on their direct manager.

    1. Which team are you on?
    2. My direct manager gives me actionable feedback that helps me improve my performance.
    3. My direct manager does not “micromanage” (i.e. get involved in details that should be handled at other levels).
    4. My direct manager shows consideration for me as a person.
    5. My direct manager provides a space for me to use my voice, and really listens when I do.
    6. My direct manager keeps the team focused on our priority results.
    7. My direct manager has had a meaningful discussion with me about my contributions to this project in the past six months.
    8. My direct manager communicates clear goals for our team.
    9. My direct manager has the relevant expertise to effectively lead me.
    10. I would recommend my direct manager to others.
    11. Any comments?

    Survey 2: The Manager Survey

    This survey goes out to all managers. The first set of questions are the same as for the first survey. The second set are slightly different but substantially similar to the first survey – except that because it’s just people who report directly to me, I can use my name rather than “direct manager”.

    1. Cate gives me actionable feedback that helps me improve my performance.
    2. Cate supports me in developing my own leadership skills.
    3. Cate shares interesting and helpful resources with me to make me a better manager.
    4. Cate has the relevant expertise to effectively lead me.
    5. Cate does not “micromanage” (i.e. get involved in details that should be handled at other levels).
    6. Cate keeps the team focused on our priority results.
    7. Cate communicates clear goals for our team.
    8. Cate shows consideration for me as a person.
    9. Cate has had a meaningful discussion with me about my contributions to this division in the past six months.
    10. Cate has had a meaningful discussion with me about my career development in the past six months.
    11. I would recommend this org to others.

    These fall under the general categories of  Development (1-4)  / Priorities (5-7) / Appreciation (8-10) / Recommend (11).

    My hope is that these topics come up in our 1:1s, but there’s something to be said for stepping back and looking at the overall picture as a series of graphs. It’s hard to get feedback as a manager, and it’s hard to trust the feedback you do get – so this can be a helpful checkpoint.

    Now What?

    Well… wait.

    Next week, I’ll share how I analyse the data and make it actionable.

    For now, if you want to use these, I’ve made a shared folder available. Feel free to make a copy and customise them!

    See part 2: Analysis.

  • You Get What You Incentivise

    You Get What You Incentivise

    tulip stair
    Credit: Wikipedia

    It’s about 18 months since my friend Tracy wrote this post pointing out that whilst the tech industry evangelises data for decision making, there is very little available when it comes to diversity numbers. And about 12 months since we started seeing companies release their numbers. Helped along by radical shareholder action from Jesse Jackson Sr.

    This is progress, right? These things didn’t used to be discussed even internally, which is ridiculous because if you’re a woman on a team with more men named “Dave” than women, it’s the kind of thing you notice. Just because you don’t know the global, or local, percentage, doesn’t mean you don’t have a good idea of what is going on.

    These are good developments, but at this point perhaps it’s worth stepping back and considering – how far have we come, actually?

    Firstly, there is no consistent definition of what “engineering roles” means. My understanding is that it ranges from a narrow definition of ENG/UX/PM, through to a “everyone who reports into an engineering cost centre”. The numbers vary accordingly, but not everyone knows this – I’ve spoken to women who were comparing numbers at companies as part of their decision to take a job (or not) thinking that it was a different of percentages… when it was actually mostly a difference of definitions.

    Secondly, if we’re going to blame the pipeline of women and minorities with CS and related degrees, and by “we” I mean “tech companies disclaiming responsibility for the culture they have created” it makes sense to tie the numbers to roles where a CS degree might actually be a benefit.

    It’s not like there isn’t precedent for this – the ABI Top Company for Women awards use a standard definition for technical roles. Companies who have participated in this have that data. They have just chosen to release other – better looking – data instead.

    As with all processes and incentives, you get what you incentivise. What concerns me is what is what is incentivised in this scenario: padding the definition of “engineering role” to make the numbers appear better, and focus on hiring “diverse” new grads.

    What would we want to incentivise? Perhaps:

    • Hiring under-represented groups at every level.
    • Paying them equitably.
    • Building a culture where everyone is allowed to succeed:
      • Where they have equal opportunity to do equal work.
      • Where promotions aren’t delayed by gendered or racial feedback and expectations (hello, lawsuits).

    What I would love to see is firstly a standard definition of what “engineering role” means.

    The second, more revolutionary thing that I would like to see, is companies reporting not just the percentage of minority groups but the percentage of compensation going to minority groups (e.g. as determined via a standard measure, like taxation).

    This removes the incentive to pad out “engineering” with less prestigious, and less well paid roles to make the numbers look better.

    It makes hiring more senior people from under-represented groups, and paying those people equitably more important.

    And for people looking at these numbers when evaluating companies, it would be a helpful metric. For myself, I’d prefer a company with 15% women in “engineering” roles receiving 13% of “engineering” compensation than one with 18% women in “engineering” roles receiving 12% of compensation. We know there is going to be a gap – women are better represented at lower levels. But the size of, and comparison of that gap would be very telling.

    As in all things when it comes to diversity in the tech industry, we know that the data on people of color is even worse, and there is a racial pay gap as well as a gender one, generally.

    I suspect we’ll never see this data. Because yeah we saw some progress, but we saw a lot more PR.

  • Problems of Statistical Significance

    Problems of Statistical Significance

    the building blocks of life
    Credit: Wikipedia

    I think one of the biggest problems for diversity, and for accountability of diversity, is one that we never talk about. Statistical significance.

    Imagine there is a company with 1000 engineers, of which 20% are women. The company declares their numbers proudly, saying they are beating the latest US graduation rate for women in Computer Science and therefore doing exceptionally well.

    But let’s suppose only 600 of the engineers graduated from US universities, 9o of them women. So actually only 15% are at all comparable to the US graduation rate.

    Also this doesn’t factor in the year of graduation, which with numbers declining for the last 20 years is a… well let’s call it an oversight.

    Now there’s the non-US graduating population of 110 women and 290 men. 27.5%! Impressive. But start breaking down by country – 20 from Canada, 10 from Romania, 30 from India, 30 from China… and we end up with some very small numbers it is hard to extrapolate from.

    Or say there are 100 people who do not have a degree at all, of whom 2 are women. We can be pretty sure here that men without degrees are vastly more likely to be hired, but what can we extrapolate from a sample size of 2? If we share that data, will be people be able to figure out who the 2 women are?

    If this company has a typical percentage of black people for a tech company – about 2% – there will be 20.

    International Data

    If we wanted to actually compare the percentage to graduation rates we would consider (at a minimum):

    • Year of graduation.
    • Country (or State) of graduation.

    This would be a pain to compute, but the biggest problem there is that most of this data cannot be found, and the data that is there is not comparable. Sometimes Computer Science is grouped with Engineering, sometimes in IT. Many countries do not share this information, or it is buried away inside PDFs making it challenging to find. Participation rates are also often not comparable, in the UK, the BCS is curator of this kind of information and they tend to use a broader “IT” designation.

    Accountability

    One thing that I would love to see, that we included on the bingo card, is managers being held accountable for diversity on their teams. For example: tracking when women leave managers, prevalence of reports of problems etc.

    So managers have 10-20 reports, which means in our example they have 2-4 women reporting to them, if the women are equally distributed. This is a massive if, unless our hypothetical company was manufactured 5 minutes ago (OK, it was) out of entirely new grads (even in SV, no).

    Women tend to cluster, because more women means a better environment, and because women often talk to and warn each other of places best avoided. So our bad, sexist managers in this company, they have at most 1-2 women reporting to them.

    So when a woman leaves that manager, she might cite a bad environment but she has every reason not to. If they just track how often women leave relative to men and women leave 50% more often… how long will it take to get enough data to indicate there is a problem? It’s possible to add other women to the team to see… but this isn’t a science experiment. It’s someone’s career.

    So maybe even the loss of one woman triggers the manager being sent to remedial diversity training. They’ll probably figure out why though. And then they will know who to blame.

    The Plural of Anecdote

    There’s this common critique of writing on this topic, which has also been levelled at me, and that is that “the plural of anecdote is not data”. Which is true.

    But when the numbers are this low it is really challenging to get that data for women, and near impossible to get it for other minorities in tech.

    So if you insist that change can’t happen until we have enough data, we’ll be waiting a long time.

    But if the plural of anecdote cannot be data, can it be trend?

    I see some alarming trends.

    As individuals, we are all flawed. If we look at the example of women being called abrasive (something for which we do have some data), for an individual you can look for reasons to justify it (“she said …“, “she’s abrupt“, “she really upset him“). But the problem isn’t that one women gets called abrasive… it’s that almost all of us do. The macro, not the micro. The trend, not the anecdote.

    Assumptions Made

    There are some big assumptions made on diversity data. Just two:

    • The US graduation rate (in Computer Science) is a good benchmark.
    • Women will leave as soon as they are unhappy.

    I think we have evidence to suggest that both these assumptions are deeply flawed… but maybe we don’t have the data to suggest what to put in place instead.

    I don’t have good suggestions here. Other than: observe the trends.

  • A Little Bit Of Data is a Dangerous Amount

    A Little Bit Of Data is a Dangerous Amount

    Data Center - NCC
    Credit: Flickr / Beraldo Leal

    When you have no data, everyone agrees: need more data.

    When you have a lot of data, what is happening is pretty clear.

    When you have a little bit of data, people can extrapolate. “It might show X”, “It might show Y”. Often declared without the caveats. Because “we don’t really know” is a much less compelling story, even if it is more accurate.

    But… we don’t really know.

    If you’re measuring the performance of a layout on your menu bar, with some actions exposed and some hidden away in a submenu, and you know that people more often tap the exposed options, you might declare success.

    But. A little more data might show people cancelling those actions disproportionately more.

    So now what do you know? That people aren’t always finding what they are looking for first try, that those options are not necessarily the ones that should be exposed.

    The answer is logging everything and (I would hope this is obvious) to the same place.

    And, when you think data has backed up a conclusion… think about whether you have all the data to really be sure about that.

     

  • Part 8: Who’s Talking About The Future of Newspapers?

    Part 8: Who’s Talking About The Future of Newspapers?

    I’m working on a paper on topical communities, and as part of that I’ve come back to this dataset to explore the social network that emerges through @ mentions.

    To start with, I looked at the social network that emerges when we look at the people on the list.

    Future of the News Network
    Future of the News Network

    This network is pretty densely connected, with the exception of two users on the list. You can see their nodes floating away in the image below:

    Future of the News Network - Outliers
    Future of the News Network – Outliers

    The network graph that emerges from all the tweets connected is really busy, but may show who the most engaged users are.

    Future of the News Network - Full
    Future of the News Network – Full

    There’s just too much information here, so I started filtering it by eliminating nodes that had fewer than a specified minimum number of connections. Because of the dataset available, non-news-influencer nodes cannot be connected to each other. Thus, I was specifying how many influencers needed to mention a user for them to make it into the graph.

    Future of the News Network - Minimum 2
    Future of the News Network – Minimum 2

    Setting the minimum to two dramatically reduces the size of the graph. Many of the nodes remaining are also well known, for example @jack and @alyssa_milano.

    Future of the News Network - Minimum 3
    Future of the News Network – Minimum 3

    We can also see popular websites, like @techcrunch and @boingboing as well as @google (not surprising given how often google showed up in the earlier visualizations of tweet content.

    Future of the News Network - Minimum 4
    Future of the News Network – Minimum 4
    Future of the News Network - Minimum 5
    Future of the News Network – Minimum 5
    Future of the News Network - Minimum 6
    Future of the News Network – Minimum 6
    Future of the News Network - Minimum 7
    Future of the News Network – Minimum 7
    Future of the News Network - Minimum 8
    Future of the News Network – Minimum 8
    Future of the News Network - Minimum 9
    Future of the News Network – Minimum 9
    Future of the News Network - Minimum 10
    Future of the News Network – Minimum 10

    I find the graphs for minimum 8+ fascinating – I think they start to show who influences the influencers.

    Future of the News Network - Minimum 11
    Future of the News Network – Minimum 11

    Eventually, of course, we get back to our original graph.

     

  • Part 7: Who’s Talking About The Future of Newspapers?

    I used the Classifier4J Summarizer to summarize the tweets and pick out the 5 tweet summary of the period, and the 1 tweet summary of @ replies and non-directed tweets.

    Only letters and spaces were kept for the summary (thus each tweet was treated as one sentence), the summarizer transforms everything to lower case and removes some things like the identifier on bitl.y  and other short links.

    For this one, I’m having a little fun, really. Essentially I’ve tried to summarize 2 months of Twitter for each user on a slide, hence the inclusion of their pic (as of now, unfortunately not at the time period covered).


  • Part 6: Who’s Talking About The Future of Newspapers?

    Continued on from Part 5, exploring what they are saying using the Phrase Net visualization from Many Eyes.

    Each image is a link to the applet where you can explore the text and interact with it. Change the linking word on the left – I’ve used space, but “and” or “is” in particular could be enlightening.

    I like this visualization because it shows what goes together. The fact that “globe” and “mail” are linked by “and” is perhaps not unexpected, but what does “Google” link to? News? Facebook? Buzz? What do these link to in turn – privacy? Social networking?

    Let me know what you find!

    Alex Howard
    C50006f8-b5bd-11df-a110-000255111976 Blog_this_caption
    Alfred Hermida
    3c32f410-b5be-11df-b20a-000255111976 Blog_this_caption
    Andrew Keen
    6f68e268-b5be-11df-b20a-000255111976 Blog_this_caption
    Cody Brown
    8f5eb00c-b5be-11df-a76f-000255111976 Blog_this_caption
    Dan Gillmor
    B4982ee8-b5be-11df-a76f-000255111976 Blog_this_caption
    Dave Winer
    D760ac34-b5be-11df-947b-000255111976 Blog_this_caption
    David Cohn
    09b1aecc-b5bf-11df-947b-000255111976 Blog_this_caption
    David Eaves
    2f0b4bd8-b5bf-11df-ba1e-000255111976 Blog_this_caption
    Dr. Mark Drapeau
    9884ba04-b5bf-11df-947b-000255111976 Blog_this_caption
    Howard Weaver
    Bd7aafda-b5bf-11df-ba1e-000255111976 Blog_this_caption
    Jay Rosen
    E8315c88-b5bf-11df-8a6b-000255111976 Blog_this_caption
    JD Lasica
    F69e1a04-b5bf-11df-ba1e-000255111976 Blog_this_caption
    Jeff Jarvis
    23f3f186-b5c0-11df-afa4-000255111976 Blog_this_caption
    Jennifer Preston
    2ec13308-b5c0-11df-8a6b-000255111976 Blog_this_caption
    Kirk LaPointe
    6d3fc310-b5c0-11df-9d86-000255111976 Blog_this_caption
    Mark Glaser
    9d5b9790-b5c0-11df-8a6b-000255111976 Blog_this_caption
    Matthew Ingram
    Be0d0186-b5c0-11df-a76f-000255111976 Blog_this_caption
    Steve Buttry
    E8b17034-b5c0-11df-a110-000255111976 Blog_this_caption
    Steve Outing
    F7544f3a-b5c0-11df-9d86-000255111976 Blog_this_caption
    Steve Yelvington
    3023889e-b5c1-11df-8b50-000255111976 Blog_this_caption
  • Part 5: Who’s Talking About The Future Of Newspapers?

    Continued on from Part 4, exploring what they are saying using Word Trees on Many Eyes.

    Each image is a link to the applet where you can explore the text and interact with it. Change the word in the top left corner to change the root of the tree.

    Alex Howard
    E873b62c-b016-11df-a0a3-000255111976 Blog_this_caption
    Alfred Hermida
    1eaf0ff0-b019-11df-a869-000255111976 Blog_this_caption
    Andrew Keen
    70778a10-b019-11df-8612-000255111976 Blog_this_caption
    Cody Brown
    B5b2c7de-b019-11df-8ecc-000255111976 Blog_this_caption
    Dan Gillmor
    F1f5138c-b019-11df-8ecc-000255111976 Blog_this_caption
    Dave Winer
    2a5d942e-b01a-11df-8612-000255111976 Blog_this_caption
    David Cohn
    7c3982f8-b01a-11df-a869-000255111976 Blog_this_caption
    David Eaves
    984d5c3a-b01a-11df-8985-000255111976 Blog_this_caption
    Dr. Mark Drapeau
    F463ce64-b01a-11df-a869-000255111976 Blog_this_caption
    Howard Weaver
    0bf46818-b01b-11df-8985-000255111976 Blog_this_caption
    Jay Rosen
    58524536-b01b-11df-8612-000255111976 Blog_this_caption
    JD Lasica
    90e1db14-b01b-11df-8ecc-000255111976 Blog_this_caption
    Jeff Jarvis
    Ce0596fc-b01b-11df-9ca9-000255111976 Blog_this_caption
    Jennifer Preston
    F44e040c-b01b-11df-b431-000255111976 Blog_this_caption
    Kirk LaPointe
    4d423178-b01c-11df-8985-000255111976 Blog_this_caption
    Mark Glaser
    6900ff20-b01c-11df-b431-000255111976 Blog_this_caption
    Matthew Ingram
    B7fbcd6c-b01c-11df-b431-000255111976 Blog_this_caption
    Steve Buttry
    Cbb1519c-b01c-11df-9ca9-000255111976 Blog_this_caption
    Steve Outing
    28476982-b01d-11df-9ca9-000255111976 Blog_this_caption
    Steve Yelvington
    61ca3aae-b01d-11df-9ca9-000255111976 Blog_this_caption
  • Part 4: Who’s Talking About The Future Of Newspapers?

    Part 4: Who’s Talking About The Future Of Newspapers?

    In which we answer the question – what are they saying?

    I’ve split the tweets up into two types – at replies, and not at replies, and a third which contains all tweets. I’ve created wordles of each one, for each of the 20 people we were following.

    If you haven’t – check out wordle.net. It’s awesome.

    There’s debate as to whether wordles are good ways to analyze text – definitely there are better ways (possibly to be explored in a future post) however I think they’re cool and here they have some utility. Note, though, that sizes of word are relative to the number of words in the data set for that individual, which are of varying size (see Part 1, Part 2, Part 3).

    I don’t want to tread on Caitlin’s analysis (I’m just the data junkie), but some things you can see, aside from topics of discussion:

    • People who make a point of thanking others (most likely for retweets or similar)
    • People who retweet things that others have said about them
    • Where RT is conspicuous by it’s absence
    • Specific websites that get tweeted a lot

    My personal favorite is Dave Winer’s all tweets! Let me know what you think.

    Programming-wise, the code is trivial because wordle accepts free text. But, before I realized that the guy who wrote wordle was much smarter than me, I tried to be clever an optimize it by using a LinkedHashSet. I chose this data structure on the basis that – I wanted O(1) random access (the hash) because I would find the same words repeated, only one instance of each word (the set) and a nice quick iteration (the linked) so I could output a key, value table at the end. And then I discovered that there was no get() or elementAt() method – and stopped trying to be a smart-alec!