ELLA

A home robot that helped children practice language in everyday family life.

ELLA is an embodied AI learning companion for children at home. Parents choose target vocabulary and themes, and the robot turns that input into personalized storytelling sessions with recall prompts, comprehension checks, and target-word practice. The central question was simple: could a robot support real language learning outside the lab?

ACM IDC 2026 Home Deployment Early Learning Generative AI

Victor Nikhil Antony*, Shiye Cao*, Shuning Wang, Chien-Ming Huang

*equal contribution · ACM IDC 2026

Overview of the ELLA project from interviews and workshops through the eight-day home deployment.

From interviews and workshops to an eight-day deployment in the home, ELLA was shaped around real routines rather than lab-only interaction.

10
Children deployed with ELLA at home for eight days
29/40
Target words actively used by children during sessions
5–8
Days when average words per turn rose above days 1–4
12
In-home workshops used to refine stories, questions, and onboarding

The strongest signal was not just enjoyment. It was language growth.

ELLA did more than keep children engaged. The deployment showed significant vocabulary gains, active use of target words during sessions, and longer verbal responses over time. That makes the project compelling as a learning system, not only as an expressive robot.

Vocabulary gains Measured

A PPVT-style pre/post assessment showed significant improvement in children’s recognition of the target vocabulary by the end of deployment.

Target-word production Observed

Children actively spoke 29 of 40 target words during interactions, showing that vocabulary moved from hearing to use.

Longer responses over time Days 5–8

Average words per turn increased significantly in the second half of the study, suggesting deeper participation as the routine settled in.

Child Language During Deployment

Learned target words are highlighted to show where home interaction translated into measurable uptake.

Pseudonym Age Stories Target words used Target word usage during sessions Total target words Avg words/turn
Grace 4 18 3/4 Massive (14) Ordinary (16) Clumsy (13) Imitate (0) 43 3.78
Andrew 4 13 3/4 Permission (3) Self-Control (2) Imagine (1) Confident (0) 6 3.48
Sarah 4 18 4/4 Compassion (2) Awestruck (2) Perseverance (0) Gumption (0) 4 3.48
George 4 4 1/4 Chirp (2) Permission (0) Consequences (0) Orbit (0) 2 2.46
Jason 5 11 2/4 Clumsy (8) Imitate (1) Somersault (0) Frisky (0) 9 4.91
Natalie 5 12 2/4 Advocate (2) Bait (1) Justice (1) Apartment (0) 4 3.02
James 6 10 3/4 Frisky (4) Wonder (3) Permission (1) Sympathy (0) 8 2.48
Susan 6 30 4/4 Achieve (23) Attempt (17) Persistent (16) Considerate (10) 66 6.30
Helen 6 20 4/4 Advocate (9) Sympathy (7) Legal (5) Empathy (5) 26 4.98
Isabella 6 11 3/4 Usual (3) Sheriff (1) Adventure (1) Orbit (0) 5 4.60

Usage stayed steady enough for learning to happen, not just novelty.

The outcomes matter more because they happened in a repeated home routine rather than a single session. ELLA was used across the eight-day deployment with broadly positive sentiment around talking to the robot and listening to its stories.

ELLA usage across the eight-day deployment, broken down by child and weekday versus weekend.

Repeated use across the week suggests the robot fit into family routines instead of fading after the first encounter.

Children did not just listen to stories. They folded ELLA into home life.

The deployment became most interesting when children treated ELLA as part of their social environment: bringing in objects, linking stories to personal memories, asking for more, and involving siblings or parents.

Examples of sharing interests, sharing personal experience, and sibling dynamics during interactions with ELLA.

Children tied the stories back to their own lives, interests, and family context instead of treating ELLA as a fixed tutor.

Examples of parent prompting, playful rambling, and direct story requests during interaction with ELLA.

Playful rambling, story requests, and parent prompting show how the interaction became socially shared rather than tightly scripted.

Object-based engagement

Children sometimes left to fetch toys or other objects connected to the story, pulling the surrounding room into the interaction.

Personal storytelling

Target words often became entry points for children to talk about their own travel, play, memories, and experiences.

Family participation

Parents and siblings sometimes stepped in as collaborators, shaping the sessions into shared routines rather than one-on-one tutoring.

The interaction worked because it was carefully scaffolded.

ELLA is not freeform chat. The session format deliberately moves through storytelling, perception, recall, and target-word practice so the child hears, understands, and produces language within one repeated loop.

A full ELLA story session showing greeting, storytelling, story perception, story recall, target word practice, another story, and farewell.

A single session moves from greeting to narration to recall and vocabulary practice, making the interaction pedagogically structured rather than casually conversational.

Why the structure matters Learning

Repeating target words in narrative context and then asking recall questions gives children multiple ways to encounter and produce language.

Why embodiment matters Presence

The robot gives the interaction timing, turn-taking, and ritual presence that a passive app or video would struggle to create.

Caregiver input is transformed into a full embodied learning session.

The technical system matters because it lets a small amount of parent input become personalized stories, scaffolded prompts, moderated dialogue, and autonomous robot behavior for use in the home.

ELLA generation pipeline from child name, target word, and theme through story and interaction generation to robot behavior.

A small amount of caregiver input becomes a complete story-and-practice session tuned to the child and target vocabulary.

ELLA system diagram with speech-to-text, content moderation, story generation, and robot control modules.

Story generation, speech handling, moderation, and robot control are tied together for autonomous use in the home.

Why it matters

ELLA suggests a strong direction for embodied AI: systems that do not just generate content, but turn that content into a recurring social learning ritual. The project is compelling because it combines measurable gains, rich home interaction, and a form factor that can live inside family routines rather than outside them.