The Evolution of Duolingo

Posts32Likes40Joined9/5/2022LocationBerlin / DE
Learning Chinese - Mandarin, Dutch, Croatian, Italian, Russian, Spanish

Luis von Ahn is one of the smartest people on the planet. He has a knack for seeing two tough problems and solving the problems by combining them. The most famous example of this is the ReCAPTCHA, which combines

problem 1: websites have trouble with too many bots accessing them and

problem 2: when digitizing old books, computers often don't manage to read parts where the print is unclear.

Luis saw that problem 2 is a good filter to distinguish computers (bots) from humans, which made it the perfect solution for problem 1. He invented ReCAPTCHA and sold it to Google in 2009. For several years after that, whenever you wanted to register for a forum or other website, you had to help computers read old books by solving a ReCAPTCHA challenge. Eventually computers got too good and humans got too naughty (there was a concerted effort to convince ReCAPTCHA that all missing words in old books should read "penis"), so nowadays ReCAPTCHA has us identify ships and traffic lights and whatnot within images, contributing to another part of machine learning.

Duolingo in the 2010s

When inventing Duolingo, the two problems that Luis was trying to solve by combining them were:

problem 1: too much web content is monolingual while readers are increasingly diverse and speak many languages

problem 2: learning a foreign language is often unaffordable to those who could benefit from it the most, and also takes too much time.

Luis' solution was to create a free online language course that would be profitable because students would, as part of their studies, translate ephemeral web content for major companies. He even got CNN and Buzzfeed on board with this and soon millions of students were using Duolingo. In the end, this model had to be given up for similar reasons as the original ReCAPTCHA: computers got too good at machine translation (at least better than beginning language students) and humans got too naughty (inserting vulgar words into texts they were supposed to translate and mass upvoting particularly "creative" translations). Also, there were legal issues, since in the EU it is illegal to sell work (translations) that was created by unpaid workers.

Bereft of this method of monetisation, Duolingo had to find ways to earn money from those studying languages on it, hence the various for-pay items and subscriptions that you see now, and their IPO.

Educational model

There was another thought in inventing Duolingo though: the idea that language education at US high schools is inefficient and usually unsuccessful. This is something few polyglots doubt. However, Duolingo's model in the early 2010s was a classic example of tech solutionism: rejecting the insights of centuries of didactics, deliberately not hiring professionals to create courses, and instead believing that Artificial Intelligence would find a more efficient way of teaching languages.

For this, they needed a lot of sentences (so the system could figure out which ones were easiest and would make a good learning path) and a lot of learners. This is why early Duolingo courses were famous for teaching a lot of words for animals, a lot of words for fruit, and then having tons of sentences of the type "The <animal> eats the <fruit>" and "The <animals> eat <fruits>". Once you suspend disbelief, it is possible to generate a huge number of sentences like this, just swapping out the name of the animal and the name of the fruit.

An experienced educator might have told them that it's also possible to generate a huge number of sentences of the type "<name> is (not) (in/from) <city>", which is more useful for conversations AND easier (in most languages this would not involve noun gender, conjugation or cases), and could be followed up by "<name> is a / works as a <profession>" in order to gradually introduce noun gender and other grammar... but when you try to re-invent a field from first principles, you re-invent it from first principles. The real problem (well, one of them) is that Duolingo's AI was always at least an order of magnitude short of the amount of sentences that it would have needed in order to train itself to show these easier, more useful sentences first.

Another feature of Duolingo's approach to language teaching has been the focus on inductive learning, i.e. expecting learners to deduce grammatical rules from the examples they see rather than from explicit instruction. (Luis von Ahn explained to me that their experiments showed that learners were more likely to exit the site when shown grammar explanations.) There is nothing wrong with inductive learning - there is a scientific basis for it - but it does not combine very well with an approach that shows only one sentence at a time. Most learners will not remember previous sentences well enough in order to be able to compare and deduce a rule. Doubly so if the previous sentence is randomly chosen by the AI rather than specifically designed to enable a comparison and make it easy to infer what is going on.

The focus on teaching languages sentence-by-sentence also means that (in my experience) Duolingo students perform worse in conversations. Even if they were lucky enough to study a course that does put an emphasis on useful phrases like "I am from <place>" rather than "The duck eats the strawberry", the problem is that they did not see this phrase in the context of the question "Where are you from?", so they lack the reflex, which other courses hone, of answering "Where are you from?" with "I am from <place>", and many other conversational reflexes.

Recent changes

Bit by bit, Duolingo is addressing all of these issues. Their Spanish course - always the avant garde - now has some explicit grammar instruction, many useful phrases for conversation, and even dialog-based exercises where students learn to answer common questions. I am hoping that these features will be rolled out to more and more languages. Many of the less popular languages still have units that teach several dozen adjectives, or several dozen verbs, or all prepositions... these are didactic nonsense and should be replaced as soon as possible.

Given that, according to Duolingo's own claim, more people study languages on Duolingo than in the American high school system, it makes sense that regulators should push for a minimum of quality. For European languages, quality is usually measured through the Common European Framework of Reference for Languages (CEFR), which determines what students should know at which level, so that courses, grammar books, easy readers and so on can all target particular levels. The CEFR also provides standardized language tests that are accepted throughout Europe.

When I was feeling particularly altruistic - and particularly annoyed by "20 of your favourite prepositions" type lessons - I decided to join Duolingo's volunteer team in order to make the Greek course CEFR-compliant. In fact almost all Duolingo courses were created and maintained by volunteers. This explains the great difference in quality within and between courses, e.g. for the Esperanto course the volunteers included some of the best-known teachers of the language, but - based on my interaction with fellow volunteers - it seems that most of them were students and homemakers rather than trained teachers. In 2021, as part of the preparation for Duolingo's IPO, all the volunteers were told they could no longer contribute to the courses except if Duolingo hired them. (There was also an award of some monetary value to reward their existing work.) The number of hired contributors is only a fraction of the original volunteer workforce, meaning that it takes longer to have mistakes fixed and efforts like mine to make the Greek course compatible with the CEFR had to be abandoned. It's ironic that Duolingo now claims its courses are compatible with the CEFR. The Spanish course may be, and perhaps other major courses like French or German (haven't been able to check), but I know for sure that the Greek course isn't yet compatible and I have doubts about most smaller EU languages.

The most recent change, which has upset a lot of users, is the introduction of a Path: learners can no longer choose which lesson to study next, or when to upgrade a lesson from level 1 to level 2 and beyond, this is now determined for them. Despite all the complaints, that is one change I don't reject. It's a well-known secret that some learners would study the entire tree on level 1 only and that this would result in them forgetting most vocabulary before it could be solidified. (To pass level 1 or even 2, you only have to recognise a word, not actively know it. Active knowledge mainly comes in later levels.) The Duolingo-recommended way of studying their tree was the "wave" model, i.e. completing all lessons in the newest section on level 1, then going back to the previous section and completing level 2, going back to the section even before that and completing level 3, and so on. I suspect that most Duolingo users did not hear about this or did not implement it, so it's good that the system now enforces it via the Path, and also integrates regular practice sessions.

The Path also abolishes the ability to choose between 2-3 different topics that could be studied next. Having that choice was nice - several times I only continued because I could switch from a boring topic to a more interesting one for a bit - but I know that for teachers it's easier to create self-reinforcing lessons if they know which topics were mastered before. Of course the lessons would have to be re-written in order to take advantage of this now; otherwise they should continue to let people choose.

Duolingo today

Duolingo has come a long way from the original idea. A lot of the original mistakes and misconceptions have been fixed. However, I would still only recommend it as one tool among many - especially if your goal is speaking ability, you really need to supplement Duolingo with something else. Quality-wise, most Teach Yourself courses beat most Duolingo courses. The only case in which I heartily recommend Duolingo is if you are able to self-study grammar and your biggest issue is motivation/consistency - the amount of gamification in Duolingo makes it very easy to keep coming back to it. If you study inefficiently but you study every day, your results will be better than if you study efficiently once a month.

Good luck with your studies!

I offer personal language coaching if you like.

Do you want to crush your language goals? Sign up here.
Posts1663Likes1103Joined18/3/2018LocationBellingham / US
Learning Italian
Other Chinese - Mandarin, French, German, Japanese, Korean, Portuguese, Russian, Spanish, Swahili, Tagalog, Thai

GermanPolyglot wrote:
To pass level 1 or even 2, you only have to recognize a word, not actively know it.
This is the type of thing that makes Duo a poor choice for most learners imo, so it's good to hear they are addressing it. 

Learning Italian every day!


Not my photo league but an example of the gamification of Duolingo league values being demotivating. If part of the draw to learners is the leagues and other gamification then how are some learners getting several thousands of xp when others only manage a couple of hundred. I’m on and off all day as a carer and do more than average but I’m in the couple of hundred. Maybe there’s a discord in how xp is “won” in some languages vs others. And some courses don’t even have tips or stories in the app, which is there on the web and the functionality exists so I think they need to be consistent across all courses and look at why their gamification is becoming demoralising for some learners 

Want to learn more? See the other blog posts