Tag: data

Org Survey Part 1: Questions
Credit: Wikimedia

I remember when I managed a 6 person team, and I always felt like I had a handle on what was going on (perhaps I’m being overly nostalgic here). Now I manage a ~26 person organisation, and on a good day I feel like I have a general idea of what’s happening, and certain specific things that I’m focusing on in more depth.

For all the effort I make to be accessible and to build a relationship with individuals on the team, the reality is that a bi-monthly 1:1 and the odd slack conversation can only go so far. This is managing at a level of indirection.

A couple of months ago, with some input from existing practises, HR, and my colleague John, I put together an organizational health check. This took the form of two two-part surveys (using Google forms), that I’ve since refined and helped my peers use for their orgs.

Survey 1: The Org Survey

This survey goes out to all ICs on the team. The first set of questions focus on their overall impression of the org and org leadership.
1. My org lead communicates division strategy and direction to me in a way that’s clear and enables me to act on it.
2. My org lead regularly shares relevant info from the company, org, and other relevant parties.
3. My org lead has articulated to me a clear vision for the future of the org.
4. I agree with the vision that my lead has articulated for the future of the org.
5. The org is aligned with the the mission of the company.
6. The mission of the company is the right one for the future.
7. The org is high performing.
8. The wider organisation recognises the performance of the org.
9. My org lead provides a space for me to use my voice, and really listens when I do.
10. My org lead has helped me develop as an individual or leader on the team.
11. My org lead has actively supported me with technical or people issues.
12. Any other comments about the org?
The second set of questions focuses on their direct manager.
1. Which team are you on?
2. My direct manager gives me actionable feedback that helps me improve my performance.
3. My direct manager does not “micromanage” (i.e. get involved in details that should be handled at other levels).
4. My direct manager shows consideration for me as a person.
5. My direct manager provides a space for me to use my voice, and really listens when I do.
6. My direct manager keeps the team focused on our priority results.
7. My direct manager has had a meaningful discussion with me about my contributions to this project in the past six months.
8. My direct manager communicates clear goals for our team.
9. My direct manager has the relevant expertise to effectively lead me.
10. I would recommend my direct manager to others.
11. Any comments?
Survey 2: The Manager Survey

This survey goes out to all managers. The first set of questions are the same as for the first survey. The second set are slightly different but substantially similar to the first survey – except that because it’s just people who report directly to me, I can use my name rather than “direct manager”.
1. Cate gives me actionable feedback that helps me improve my performance.
2. Cate supports me in developing my own leadership skills.
3. Cate shares interesting and helpful resources with me to make me a better manager.
4. Cate has the relevant expertise to effectively lead me.
5. Cate does not “micromanage” (i.e. get involved in details that should be handled at other levels).
6. Cate keeps the team focused on our priority results.
7. Cate communicates clear goals for our team.
8. Cate shows consideration for me as a person.
9. Cate has had a meaningful discussion with me about my contributions to this division in the past six months.
10. Cate has had a meaningful discussion with me about my career development in the past six months.
11. I would recommend this org to others.
These fall under the general categories of Development (1-4) / Priorities (5-7) / Appreciation (8-10) / Recommend (11).

My hope is that these topics come up in our 1:1s, but there’s something to be said for stepping back and looking at the overall picture as a series of graphs. It’s hard to get feedback as a manager, and it’s hard to trust the feedback you do get – so this can be a helpful checkpoint.

Now What?

Well… wait.

Next week, I’ll share how I analyse the data and make it actionable.

For now, if you want to use these, I’ve made a shared folder available. Feel free to make a copy and customise them!

See part 2: Analysis.
July 27, 2017
You Get What You Incentivise
Credit: Wikipedia

It’s about 18 months since my friend Tracy wrote this post pointing out that whilst the tech industry evangelises data for decision making, there is very little available when it comes to diversity numbers. And about 12 months since we started seeing companies release their numbers. Helped along by radical shareholder action from Jesse Jackson Sr.

This is progress, right? These things didn’t used to be discussed even internally, which is ridiculous because if you’re a woman on a team with more men named “Dave” than women, it’s the kind of thing you notice. Just because you don’t know the global, or local, percentage, doesn’t mean you don’t have a good idea of what is going on.

These are good developments, but at this point perhaps it’s worth stepping back and considering – how far have we come, actually?

Firstly, there is no consistent definition of what “engineering roles” means. My understanding is that it ranges from a narrow definition of ENG/UX/PM, through to a “everyone who reports into an engineering cost centre”. The numbers vary accordingly, but not everyone knows this – I’ve spoken to women who were comparing numbers at companies as part of their decision to take a job (or not) thinking that it was a different of percentages… when it was actually mostly a difference of definitions.

Secondly, if we’re going to blame the pipeline of women and minorities with CS and related degrees, and by “we” I mean “tech companies disclaiming responsibility for the culture they have created” it makes sense to tie the numbers to roles where a CS degree might actually be a benefit.

It’s not like there isn’t precedent for this – the ABI Top Company for Women awards use a standard definition for technical roles. Companies who have participated in this have that data. They have just chosen to release other – better looking – data instead.

As with all processes and incentives, you get what you incentivise. What concerns me is what is what is incentivised in this scenario: padding the definition of “engineering role” to make the numbers appear better, and focus on hiring “diverse” new grads.

What would we want to incentivise? Perhaps:
- Hiring under-represented groups at every level.
- Paying them equitably.
- Building a culture where everyone is allowed to succeed:
  - Where they have equal opportunity to do equal work.
  - Where promotions aren’t delayed by gendered or racial feedback and expectations (hello, lawsuits).
What I would love to see is firstly a standard definition of what “engineering role” means.

The second, more revolutionary thing that I would like to see, is companies reporting not just the percentage of minority groups but the percentage of compensation going to minority groups (e.g. as determined via a standard measure, like taxation).

This removes the incentive to pad out “engineering” with less prestigious, and less well paid roles to make the numbers look better.

It makes hiring more senior people from under-represented groups, and paying those people equitably more important.

And for people looking at these numbers when evaluating companies, it would be a helpful metric. For myself, I’d prefer a company with 15% women in “engineering” roles receiving 13% of “engineering” compensation than one with 18% women in “engineering” roles receiving 12% of compensation. We know there is going to be a gap – women are better represented at lower levels. But the size of, and comparison of that gap would be very telling.

As in all things when it comes to diversity in the tech industry, we know that the data on people of color is even worse, and there is a racial pay gap as well as a gender one, generally.

I suspect we’ll never see this data. Because yeah we saw some progress, but we saw a lot more PR.
March 25, 2015
Problems of Statistical Significance
Credit: Wikipedia

I think one of the biggest problems for diversity, and for accountability of diversity, is one that we never talk about. Statistical significance.

Imagine there is a company with 1000 engineers, of which 20% are women. The company declares their numbers proudly, saying they are beating the latest US graduation rate for women in Computer Science and therefore doing exceptionally well.

But let’s suppose only 600 of the engineers graduated from US universities, 9o of them women. So actually only 15% are at all comparable to the US graduation rate.

Also this doesn’t factor in the year of graduation, which with numbers declining for the last 20 years is a… well let’s call it an oversight.

Now there’s the non-US graduating population of 110 women and 290 men. 27.5%! Impressive. But start breaking down by country – 20 from Canada, 10 from Romania, 30 from India, 30 from China… and we end up with some very small numbers it is hard to extrapolate from.

Or say there are 100 people who do not have a degree at all, of whom 2 are women. We can be pretty sure here that men without degrees are vastly more likely to be hired, but what can we extrapolate from a sample size of 2? If we share that data, will be people be able to figure out who the 2 women are?

If this company has a typical percentage of black people for a tech company – about 2% – there will be 20.

International Data

If we wanted to actually compare the percentage to graduation rates we would consider (at a minimum):
- Year of graduation.
- Country (or State) of graduation.
This would be a pain to compute, but the biggest problem there is that most of this data cannot be found, and the data that is there is not comparable. Sometimes Computer Science is grouped with Engineering, sometimes in IT. Many countries do not share this information, or it is buried away inside PDFs making it challenging to find. Participation rates are also often not comparable, in the UK, the BCS is curator of this kind of information and they tend to use a broader “IT” designation.

Accountability

One thing that I would love to see, that we included on the bingo card, is managers being held accountable for diversity on their teams. For example: tracking when women leave managers, prevalence of reports of problems etc.

So managers have 10-20 reports, which means in our example they have 2-4 women reporting to them, if the women are equally distributed. This is a massive if, unless our hypothetical company was manufactured 5 minutes ago (OK, it was) out of entirely new grads (even in SV, no).

Women tend to cluster, because more women means a better environment, and because women often talk to and warn each other of places best avoided. So our bad, sexist managers in this company, they have at most 1-2 women reporting to them.

So when a woman leaves that manager, she might cite a bad environment but she has every reason not to. If they just track how often women leave relative to men and women leave 50% more often… how long will it take to get enough data to indicate there is a problem? It’s possible to add other women to the team to see… but this isn’t a science experiment. It’s someone’s career.

So maybe even the loss of one woman triggers the manager being sent to remedial diversity training. They’ll probably figure out why though. And then they will know who to blame.

The Plural of Anecdote

There’s this common critique of writing on this topic, which has also been levelled at me, and that is that “the plural of anecdote is not data”. Which is true.

But when the numbers are this low it is really challenging to get that data for women, and near impossible to get it for other minorities in tech.

So if you insist that change can’t happen until we have enough data, we’ll be waiting a long time.

But if the plural of anecdote cannot be data, can it be trend?

I see some alarming trends.

As individuals, we are all flawed. If we look at the example of women being called abrasive (something for which we do have some data), for an individual you can look for reasons to justify it (“she said …“, “she’s abrupt“, “she really upset him“). But the problem isn’t that one women gets called abrasive… it’s that almost all of us do. The macro, not the micro. The trend, not the anecdote.

Assumptions Made

There are some big assumptions made on diversity data. Just two:
- The US graduation rate (in Computer Science) is a good benchmark.
- Women will leave as soon as they are unhappy.
I think we have evidence to suggest that both these assumptions are deeply flawed… but maybe we don’t have the data to suggest what to put in place instead.

I don’t have good suggestions here. Other than: observe the trends.
January 14, 2015
A Little Bit Of Data is a Dangerous Amount

Credit: Flickr / Beraldo Leal

When you have no data, everyone agrees: need more data.

When you have a lot of data, what is happening is pretty clear.

When you have a little bit of data, people can extrapolate. “It might show X”, “It might show Y”. Often declared without the caveats. Because “we don’t really know” is a much less compelling story, even if it is more accurate.

But… we don’t really know.

If you’re measuring the performance of a layout on your menu bar, with some actions exposed and some hidden away in a submenu, and you know that people more often tap the exposed options, you might declare success.

But. A little more data might show people cancelling those actions disproportionately more.

So now what do you know? That people aren’t always finding what they are looking for first try, that those options are not necessarily the ones that should be exposed.

The answer is logging everything and (I would hope this is obvious) to the same place.

And, when you think data has backed up a conclusion… think about whether you have all the data to really be sure about that.

September 10, 2014
Part 8: Who’s Talking About The Future of Newspapers?

I’m working on a paper on topical communities, and as part of that I’ve come back to this dataset to explore the social network that emerges through @ mentions.

To start with, I looked at the social network that emerges when we look at the people on the list.

Future of the News Network

This network is pretty densely connected, with the exception of two users on the list. You can see their nodes floating away in the image below:

Future of the News Network – Outliers

The network graph that emerges from all the tweets connected is really busy, but may show who the most engaged users are.

Future of the News Network – Full

There’s just too much information here, so I started filtering it by eliminating nodes that had fewer than a specified minimum number of connections. Because of the dataset available, non-news-influencer nodes cannot be connected to each other. Thus, I was specifying how many influencers needed to mention a user for them to make it into the graph.

Future of the News Network – Minimum 2

Setting the minimum to two dramatically reduces the size of the graph. Many of the nodes remaining are also well known, for example @jack and @alyssa_milano.

Future of the News Network – Minimum 3

We can also see popular websites, like @techcrunch and @boingboing as well as @google (not surprising given how often google showed up in the earlier visualizations of tweet content.

Future of the News Network – Minimum 4

Future of the News Network – Minimum 5

Future of the News Network – Minimum 6

Future of the News Network – Minimum 7

Future of the News Network – Minimum 8

Future of the News Network – Minimum 9

Future of the News Network – Minimum 10

I find the graphs for minimum 8+ fascinating – I think they start to show who influences the influencers.

Future of the News Network – Minimum 11

Eventually, of course, we get back to our original graph.

May 9, 2011
Part 7: Who’s Talking About The Future of Newspapers?

I used the Classifier4J Summarizer to summarize the tweets and pick out the 5 tweet summary of the period, and the 1 tweet summary of @ replies and non-directed tweets.

Only letters and spaces were kept for the summary (thus each tweet was treated as one sentence), the summarizer transforms everything to lower case and removes some things like the identifier on bitl.y and other short links.

For this one, I’m having a little fun, really. Essentially I’ve tried to summarize 2 months of Twitter for each user on a slide, hence the inclusion of their pic (as of now, unfortunately not at the time period covered).

Alex Howard – Digiphile

Alfred Hermida – @Hermida

Andrew Keen – @ajkeen

Cody Brown – @CodyBrown

Dan Gillmor -@DanGillmor

Dave Winer – @DaveWiner

David Cohn – @Digidave

David Eaves – @daeaves

Dr. Mark Drapeau – @cheeky_geeky

Howard Weaver – @howardweaver

Jay Rosen – @jayrosen_nyu

JD Lasica – @jdlasica

Jeff Jarvis – @jeffjarvis

Jennifer Preston – @NYU_JenPreston

Kirk LaPoint – @kirklapointe

Mark Glaser – @mediatwit

Matthew Ingram – @matthewi

Steve Buttry – @stevebuttry

Steve Outing – @steveouting

Steve Yelvington – @yelvington

September 8, 2010

Part 4: Who’s Talking About The Future Of Newspapers?
In which we answer the question – what are they saying?

I’ve split the tweets up into two types – at replies, and not at replies, and a third which contains all tweets. I’ve created wordles of each one, for each of the 20 people we were following.

If you haven’t – check out wordle.net. It’s awesome.

There’s debate as to whether wordles are good ways to analyze text – definitely there are better ways (possibly to be explored in a future post) however I think they’re cool and here they have some utility. Note, though, that sizes of word are relative to the number of words in the data set for that individual, which are of varying size (see Part 1, Part 2, Part 3).

I don’t want to tread on Caitlin’s analysis (I’m just the data junkie), but some things you can see, aside from topics of discussion:
- People who make a point of thanking others (most likely for retweets or similar)
- People who retweet things that others have said about them
- Where RT is conspicuous by it’s absence
- Specific websites that get tweeted a lot
My personal favorite is Dave Winer’s all tweets! Let me know what you think.

Alex Howard all Tweets

Alex Howard at Replies

Alex Howard not Directed

Alfred Hermida all Tweets

Alfred Hermida at Replies

Alfred Hermida not Directed

Andrew Keen all Tweets

Andrew Keen at Replies

Andrew Keen not Directed

Cody Brown all Tweets

Cody Brown at Replies

Cody Brown not Directed

Dan Gillmor all Tweets

Dan Gillmor at Replies

Dan Gillmor not Directed

Dave Winer all Tweets

Dave Winer at Replies

Dave Winer not Directed

David Cohn all Tweets

David Cohn at Replies

David Cohn not Directed

David Eaves all Tweets

David Eaves at Replies

David Eaves not Directed

Dr. Mark Drapeau all Tweets

Dr. Mark Drapeau at Replies

Dr. Mark Drapeau not Directed

Howard Weaver all Tweets

Howard Weaver at Replies

Howard Weaver not Directed

Jay Rosen all Tweets

Jay Rosen at Replies

Jay Rosen not Directed

JD Lasica all Tweets

JD Lasica at Replies

JD Lasica not Directed

Jeff Jarvis all Tweets

Jeff Jarvis at Replies

Jeff Jarvis not Directed

Jennifer Preston all Tweets

Jennifer Preston at Replies

Jennifer Preston not Directed

Kirk LaPointe all Tweets

Kirk LaPointe at Replies

Kirk LaPointe not Directed

Mark Glaser all Tweets

Mark Glaser at Replies

Mark Glaser not Directed

Mathew Ingram all Tweets

Mathew Ingram at Replies

Mathew Ingram not Directed

Steve Buttry all Tweets

Steve Buttry at Replies

Steve Buttry not Directed

Steve Outing all Tweets

Steve Outing at Replies

Steve Outing not Directed

Steve Yelvington all Tweets

Steve Yelvington at Replies

Steve Yelvington not Directed

Programming-wise, the code is trivial because wordle accepts free text. But, before I realized that the guy who wrote wordle was much smarter than me, I tried to be clever an optimize it by using a LinkedHashSet. I chose this data structure on the basis that – I wanted O(1) random access (the hash) because I would find the same words repeated, only one instance of each word (the set) and a nice quick iteration (the linked) so I could output a key, value table at the end. And then I discovered that there was no get() or elementAt() method – and stopped trying to be a smart-alec!
August 18, 2010

Alex Howard	Alfred Hermida
Andrew Keen	Cody Brown
Dan Gillmor	Dave Winer
David Cohn	David Eaves
Dr. Mark Drapeau	Howard Weaver
Jay Rosen	JD Lasica
Jeff Jarvis	Jennifer Preston
Kirk LaPointe	Mark Glaser
Matthew Ingram	Steve Buttry
Steve Outing	Steve Yelvington

Alex Howard	Alfred Hermida
Andrew Keen	Cody Brown
Dan Gillmor	Dave Winer
David Cohn	David Eaves
Dr. Mark Drapeau	Howard Weaver
Jay Rosen	JD Lasica
Jeff Jarvis	Jennifer Preston
Kirk LaPointe	Mark Glaser
Matthew Ingram	Steve Buttry
Steve Outing	Steve Yelvington

Tag: data

Survey 1: The Org Survey

Survey 2: The Manager Survey

Now What?

International Data

Accountability

The Plural of Anecdote

Assumptions Made