This site is being remade and I won't be updating this for a while. Expect a lot of broken links, design and functionality. If all goes well, things should be stable by the end of 2022. Thank you for understanding.


Bias In AI


I lurked Twitter around International Women’s Day and I saw some notice how Google Translate would insert gendered pronouns when translating into English, even though the source language had no gendered pronouns.

Here’s some examples:

The phase used is something like:

He invests. She washes the laundry. He’s playing sports. She takes care of the children. He works. She dances. He drives a car.

The gendered pronouns aren’t in the source languages.

In Shona (which also has no gendered pronouns), it would be something close to1:

Anochengeta mari. Anowacha hembe. Anotamba mutambo. Anochengeta vana. Anoshanda. Anotamba. Anotyaira. {lang=“sn”}

And Google gives:

He saves money. He washes his clothes. He plays a game. She takes care of the children. He works. He plays. He drives.

It’s better than other translations I’ve seen though it changes once it gets to taking care of children. In fact, if I change the sentence anochengeta vana (“take care of the kids”) to anotarisa vana (“look after the kids”), it’s translated to “he looks after the children”.

Thing is, “he” is still used, much like other languages which use he as a default singlular pronoun even though Shona has no genered pronouns. Google could be used the singular they which has been around for centuries, but people could get confused by it because it’s suddenly sounds plural and it’s arguably “gramatically incorrect”. It would turn plural if the “ano” was “vano” instead, but “vano” can also be used for a formal singular, like many other languages2.

I’m not an AI researcher or linguistic and I don’t feel like looking into this more, but it would be interesting to look at the extent of the problem and how to solve it. You’d have to look at numerous languages and the variations of an expression, like I tried to do. Don’t know how to write this in a way that doesn’t belittle the effort involved in their deep learning research, but the models they train will need a lot of context. Not sure if Google will actually do this, given they don’t want to listen to their ethicists.

Then again, Google aren’t the only one’s in the computer translation business, so the researchers would have to look at them too. Good luck!

Along these lines, I plan to write a post comparing various automated image descriptions, and I saw a tweet by Sarah Fossheim on how Microsoft Word’s alt text generator was misgendering people in images.

  1. I’m not the best at Shona and there are cases where some words mean many things and others have no literal translation. For instance tamba can be dance or play and I couldn’t find a word for invest. If you know better, shout at me and I’ll fix it. ↩︎

  2. This is called the T-V distinction, from the Latin tu and vos, for the informal/singular and formal/plural yous respectivly. ↩︎