Alchemist
Alchemist is a conversational programming interface for robotics. Instead of asking end-users to think in code, it lets them describe what they want in plain language, then generates, verifies, previews, and executes robot programs inside one integrated workspace.
In labs, factories, and care settings, the people who know what the robot should do are rarely the people who know how to program it. Alchemist closes that gap by shifting the user’s job from specifying logic to specifying outcomes.
Users describe tasks in plain language instead of translating domain knowledge into robot code.
Scientists and operators can configure robot behavior without prior robotics experience.
Conversation, code generation, verification, and execution live inside the same tool instead of across separate apps.
Alchemist supports both full workflow automation and live collaborative assistance. That matters because real deployments rarely fit a single mode of use.
Users define an entire process in natural language and Alchemist turns it into a reusable program.
Users can also instruct the robot live, letting it respond to in-the-moment needs as a teammate rather than a fixed script.
Alchemist becomes usable for non-programmers because it narrows what the LLM can do, injects domain-specific rules, and verifies the generated program before anything runs on the robot.
The backend pipeline turns user intent into executable robot code through constrained generation and verification.
The LLM can only call capabilities that exist in the robot’s function library, preventing hallucinated APIs.
Every prompt carries contextual constraints so the model generates code that respects task structure and safety expectations.
Generated programs are parsed and corrected for common failures before users ever ask the robot to run them.
The evaluation centered on LB media preparation, a familiar but repetitive lab procedure. Participants described the task in natural language and used Alchemist to have the robot pick, pour, and place equipment autonomously.
A novice life sciences researcher instructs the robot through natural language instead of directly programming motion logic.
The task is not hypothetical; it reflects repetitive lab work that domain experts already want to automate.
Users stay inside a conversational loop, making the interface feel like task specification rather than software development.
The strongest result is not that experts were fast. It is that novices could complete the programming task on roughly the same timeline, suggesting the real barrier is the interface, not the user’s intelligence or domain knowledge.
Novice mean task completion was 1:03:02; expert mean was 1:01:59. Different workflows, same endpoint.
Novice users debugged through prompting instead of dropping into code, showing that the conversational loop did real work.
Participants spontaneously described workflows from their own labs that they would want a system like this to automate.
"I think the idea behind this project is really novel. It can be used extensively, especially in biochemistry labs, where tasks like these are a frequent occurrence."
Novice participant N5
"I would love to have a liquid handling system in our lab where I could simply press a button and say 'go' without worrying about it failing."
Novice participant N4
Alchemist points toward a more useful robotics future: one where domain experts can configure robot behavior directly, without waiting on specialist programmers to translate their intent. The near-term path is to extend the same interaction model across new domains and robots, while continuing to reduce the remaining reliability burden on the user.