Problems of Statistical Significance

the building blocks of life
Credit: Wikipedia

I think one of the biggest problems for diversity, and for accountability of diversity, is one that we never talk about. Statistical significance.

Imagine there is a company with 1000 engineers, of which 20% are women. The company declares their numbers proudly, saying they are beating the latest US graduation rate for women in Computer Science and therefore doing exceptionally well.

But let’s suppose only 600 of the engineers graduated from US universities, 9o of them women. So actually only 15% are at all comparable to the US graduation rate.

Also this doesn’t factor in the year of graduation, which with numbers declining for the last 20 years is a… well let’s call it an oversight.

Now there’s the non-US graduating population of 110 women and 290 men. 27.5%! Impressive. But start breaking down by country – 20 from Canada, 10 from Romania, 30 from India, 30 from China… and we end up with some very small numbers it is hard to extrapolate from.

Or say there are 100 people who do not have a degree at all, of whom 2 are women. We can be pretty sure here that men without degrees are vastly more likely to be hired, but what can we extrapolate from a sample size of 2? If we share that data, will be people be able to figure out who the 2 women are?

If this company has a typical percentage of black people for a tech company – about 2% – there will be 20.

International Data

If we wanted to actually compare the percentage to graduation rates we would consider (at a minimum):

  • Year of graduation.
  • Country (or State) of graduation.

This would be a pain to compute, but the biggest problem there is that most of this data cannot be found, and the data that is there is not comparable. Sometimes Computer Science is grouped with Engineering, sometimes in IT. Many countries do not share this information, or it is buried away inside PDFs making it challenging to find. Participation rates are also often not comparable, in the UK, the BCS is curator of this kind of information and they tend to use a broader “IT” designation.

Accountability

One thing that I would love to see, that we included on the bingo card, is managers being held accountable for diversity on their teams. For example: tracking when women leave managers, prevalence of reports of problems etc.

So managers have 10-20 reports, which means in our example they have 2-4 women reporting to them, if the women are equally distributed. This is a massive if, unless our hypothetical company was manufactured 5 minutes ago (OK, it was) out of entirely new grads (even in SV, no).

Women tend to cluster, because more women means a better environment, and because women often talk to and warn each other of places best avoided. So our bad, sexist managers in this company, they have at most 1-2 women reporting to them.

So when a woman leaves that manager, she might cite a bad environment but she has every reason not to. If they just track how often women leave relative to men and women leave 50% more often… how long will it take to get enough data to indicate there is a problem? It’s possible to add other women to the team to see… but this isn’t a science experiment. It’s someone’s career.

So maybe even the loss of one woman triggers the manager being sent to remedial diversity training. They’ll probably figure out why though. And then they will know who to blame.

The Plural of Anecdote

There’s this common critique of writing on this topic, which has also been levelled at me, and that is that “the plural of anecdote is not data”. Which is true.

But when the numbers are this low it is really challenging to get that data for women, and near impossible to get it for other minorities in tech.

So if you insist that change can’t happen until we have enough data, we’ll be waiting a long time.

But if the plural of anecdote cannot be data, can it be trend?

I see some alarming trends.

As individuals, we are all flawed. If we look at the example of women being called abrasive (something for which we do have some data), for an individual you can look for reasons to justify it (“she said …“, “she’s abrupt“, “she really upset him“). But the problem isn’t that one women gets called abrasive… it’s that almost all of us do. The macro, not the micro. The trend, not the anecdote.

Assumptions Made

There are some big assumptions made on diversity data. Just two:

  • The US graduation rate (in Computer Science) is a good benchmark.
  • Women will leave as soon as they are unhappy.

I think we have evidence to suggest that both these assumptions are deeply flawed… but maybe we don’t have the data to suggest what to put in place instead.

I don’t have good suggestions here. Other than: observe the trends.

2 thoughts on “Problems of Statistical Significance

Leave a Reply