What are the 10% that they cannot answer?
The 10% that stumps Archy are orders involving stacked customizations, heavy accents, background noise, and mid-order corrections — the situations where spoken language gets messy enough that the AI's confidence drops and it hands off to a crew member.
Why it matters: That 10% is also the most frustrating 10% for customers, which is why how cleanly the system escalates matters as much as the 90% it handles solo.
- Complex modifications — 'no tomato, extra cheese, light ice, half-decaf' — degrade accuracy fast; stacked modifiers are where nearly every deployed AI ordering system breaks down.
- Accents and dialects are a persistent weak point: systems trained on narrow audio sets misfire on regional speech at a rate that only shows up at scale, not in lab tests.
- Road noise, music, and multiple voices in the car corrupt the audio signal; AI trained on clean studio audio struggles in real lanes.
- Mid-order changes ('actually, make that a large') often get ignored or doubled, requiring a human to step in and reconcile the order.
- McDonald's frames the 10% escalation rate as a success benchmark; critics point out that in a chain doing tens of millions of drive-thru orders weekly, even a small failure rate means millions of frustrating interactions — and the IBM pilot's viral failures came from a similar claimed success rate.
