Xpress

The expression engine for social AI.

Most social robots still rely on a tiny library of canned faces. Xpress replaces static expression sets with a language-model-driven system that reads interaction context and generates expressive robot behavior in real time across storytelling and conversation.

HRI 2025 Language Models Conversational Robots Open Source

Xpress in motion: expressive face behavior rendered as a synchronized trajectory rather than a fixed expression library.

HRI 2025 · Victor Nikhil Antony, Chien-Ming Huang

Dynamic expressive behavior generation for social robots, validated across long-form storytelling and real-time conversation.

Context-aware expressions make the face feel alive instead of pre-scripted.

Existing systems usually depend on hand-tuned expression libraries or large training corpora that do not transfer cleanly across robots and interaction settings. Xpress takes a lighter path: it uses language models to segment dialogue, interpret socio-emotional context, and generate executable face behavior on demand without retraining.

"It didn't feel like a robot was telling me a story. It felt closer to a human being."

Adult participant — storytelling study

Problem Static faces

Most social robots still rely on a handful of expressions that repeat regardless of timing, dialogue, or emotional nuance.

Approach LM pipeline

Xpress segments interaction flow, conditions on context, and synthesizes executable expression trajectories instead of selecting canned poses.

Why it scales No retraining

The system works as a generative layer that can be pointed at new interaction content without building a new expression dataset for each deployment.

The face is generated as a timed expressive trajectory, not a sequence of disconnected emotions.

The strongest static view of the system is the generated face palette itself: a broad range of visual states that shift with narrative and conversational context instead of cycling through the same fixed handful of expressions.

Xpress expressive robot face states across interaction moments.

Generated face palettes across interaction moments, showing how Xpress moves beyond fixed expression sets into richer contextual variation.

Conversation example with expressions synchronized to live dialogue.

Conversation example: expressions selected in real time as the interaction unfolds.

Expression timeline showing shifts in form and color across segments.

Expression timeline: changes in palette and form stay aligned to topic, timing, and emotional flow.

The same core system supports both pre-generated storytelling and real-time conversation.

Xpress was validated in two opposite settings: long-form story narration, where the full expression trajectory can be prepared in advance, and live well-being conversation, where the system must react on the fly.

Pre-generated

Storytelling

Full expression trajectories are generated for each story, synchronized to narrative arcs, and rendered across multiple genres.

  • 245 unique expressions across 12 stories
  • 100% generated code executed without errors
  • Context Match: 71.67%
  • Validated with adults, children, and parents

Real-time

Conversation

An expression bank plus live LM selection allows the face to respond as the user speaks and while the robot replies.

  • 3 expression banks × 24 faces each
  • Context Match: 75.0%
  • No perceptible lag in study deployment
  • Validated in well-being check-ins with adults

Time, context, then code: the pipeline separates sequencing from interpretation from execution.

The system architecture stays legible even as the output becomes expressive. One stage captures temporal structure, another interprets socio-emotional context, and the final stage synthesizes executable face trajectories.

1. Encode temporal flow zT

Segment interaction content into timestamped chunks that preserve emotional shifts and narrative pacing.

2. Condition on context zC

Interpret each segment against its socio-emotional framing to form a higher-level expressive description.

3. Synthesize trajectory tau face

Generate executable face code that can be played back in sync with speech or selected live during interaction.

Storytelling pipeline from story generation to synchronized face trajectory.

Storytelling pipeline: from story text to a synchronized sequence of generated facial behavior.

Real-time conversational architecture with Reactor and Responder LMs.

Conversational system: Reactor and Responder LMs coordinate live expression selection and response generation.

Children and adults both read the face as part of the interaction, not as decoration.

The strongest evidence from the studies is that people connected the generated expressions directly to what was happening in the story or conversation. The face was legible enough to shape how the robot was understood.

Children engaging with the Xpress storytelling robot.

Children linked changes in color and shape directly to story events and emotional tone without being prompted to do so.

"Like when it was black, it was like, because it said it was in a very dark place."

Child participant — linking color to story context

"It almost felt like talking to a therapist or a friend. It's actually trying to understand what I'm saying."

Adult participant — conversational study

Xpress suggests a path away from canned expression libraries and toward expressive behavior that is generated with the interaction itself.

The immediate value is more believable storytelling and conversation. The larger implication is that expression can become a generative systems layer for social robots, expanding from face behavior toward richer multimodal embodiment over time.