Alchemist

Tell the robot the outcome. Let the system build the program.

Alchemist is a conversational programming interface for robotics. Instead of asking end-users to think in code, it lets them describe what they want in plain language, then generates, verifies, previews, and executes robot programs inside one integrated workspace.

HRI 2024 End-user programming Robotics + LLMs Open Source
Alchemist interface showing 3D robot visualization, chat, terminal, and code editor.

One interface for chatting with the robot, inspecting generated code, previewing execution, and running the task.

Ulas Berk Karli, Juo-Tung Chen, Victor Nikhil Antony, Chien-Ming Huang

HRI 2024 · 24.3% acceptance

10
Validated users across novice scientists and robotics experts
~1 hr
Task completion time for novices and experts alike
3
Robot platforms supported: UR5, Franka Panda, and TIAGo
5
Design objectives shaping the system architecture

The bottleneck is not the robot. It is the programming layer.

In labs, factories, and care settings, the people who know what the robot should do are rarely the people who know how to program it. Alchemist closes that gap by shifting the user’s job from specifying logic to specifying outcomes.

Outcome-first interaction

Users describe tasks in plain language instead of translating domain knowledge into robot code.

Built for domain experts

Scientists and operators can configure robot behavior without prior robotics experience.

One programming surface

Conversation, code generation, verification, and execution live inside the same tool instead of across separate apps.

Two ways of working with robots, one interface.

Alchemist supports both full workflow automation and live collaborative assistance. That matters because real deployments rarely fit a single mode of use.

Automation End-to-end

Users define an entire process in natural language and Alchemist turns it into a reusable program.

  • Describe multi-step workflows in plain language
  • Preview and verify before execution
  • Save and rerun procedures across sessions

Collaboration Real-time

Users can also instruct the robot live, letting it respond to in-the-moment needs as a teammate rather than a fixed script.

  • Voice input via speech-to-text
  • Ad-hoc assistance during ongoing work
  • Combines live requests with structured robot capabilities

The system does not just generate code. It constrains and checks it.

Alchemist becomes usable for non-programmers because it narrows what the LLM can do, injects domain-specific rules, and verifies the generated program before anything runs on the robot.

Alchemist backend pipeline showing function library, grounded prompts, GPT-4, and code verification.

The backend pipeline turns user intent into executable robot code through constrained generation and verification.

Function library Constrained actions

The LLM can only call capabilities that exist in the robot’s function library, preventing hallucinated APIs.

Grounded prompting Task-aware rules

Every prompt carries contextual constraints so the model generates code that respects task structure and safety expectations.

Code verification Before execution

Generated programs are parsed and corrected for common failures before users ever ask the robot to run them.

A novice researcher can program a biochemistry task through dialogue.

The evaluation centered on LB media preparation, a familiar but repetitive lab procedure. Participants described the task in natural language and used Alchemist to have the robot pick, pour, and place equipment autonomously.

Alchemist demonstration showing a user instructing the robot to pour graduated cylinders into a beaker.

A novice life sciences researcher instructs the robot through natural language instead of directly programming motion logic.

Concrete domain fit Biochemistry

The task is not hypothetical; it reflects repetitive lab work that domain experts already want to automate.

Programming without code Core shift

Users stay inside a conversational loop, making the interface feel like task specification rather than software development.

Novices and experts reached the same outcome in about the same time.

The strongest result is not that experts were fast. It is that novices could complete the programming task on roughly the same timeline, suggesting the real barrier is the interface, not the user’s intelligence or domain knowledge.

Novices ≈ experts on time Main result

Novice mean task completion was 1:03:02; expert mean was 1:01:59. Different workflows, same endpoint.

Novices stayed in language Interaction pattern

Novice users debugged through prompting instead of dropping into code, showing that the conversational loop did real work.

Domain pull was immediate Adoption signal

Participants spontaneously described workflows from their own labs that they would want a system like this to automate.

"I think the idea behind this project is really novel. It can be used extensively, especially in biochemistry labs, where tasks like these are a frequent occurrence."

Novice participant N5

"I would love to have a liquid handling system in our lab where I could simply press a button and say 'go' without worrying about it failing."

Novice participant N4

Why it matters

Alchemist points toward a more useful robotics future: one where domain experts can configure robot behavior directly, without waiting on specialist programmers to translate their intent. The near-term path is to extend the same interaction model across new domains and robots, while continuing to reduce the remaining reliability burden on the user.