In 2018, Amazon brought me in as the lead UX Sound Designer for Astro, its first consumer home robot, and that’s when the debate over robot character design really kicked off. The team split between treating the device as a mobile Alexa speaker or giving it its own personality, and I quickly learned that you can’t hide a moving thing behind a utility label.
Historical Context
Before Astro, most consumer‑focused robots were essentially smart speakers with wheels. They relied on a single voice interface and offered little visual feedback. In parallel, Amazon had been perfecting warehouse automation for years, where machines moved autonomously but never needed to appear friendly. The jump from logistics‑only bots to a home‑friendly companion forced a rethink of how hardware, software, and sound could work together. That shift set the stage for the questions that would dominate Astro’s design: Should the robot be treated like any other appliance, or does its mobility demand a distinct persona?
Key Takeaways
- Designing a robot’s character starts with defining its emotional range before any hardware is built.
- Users will assign personality to any mobile device, intentional or not.
- Sound, motion, and facial cues must be synchronized to avoid disjointed experiences.
- Cheap workarounds—like using Alexa as the sole voice—can feel creepy.
- Character stitching, the transition between expressive moments, is as critical as the moments themselves.
Robot Character Design Lessons from Amazon Astro
We didn’t set out to make Astro a talking Alexa on wheels; we set out to make it a robot that people could trust in their living rooms. The majority of the UX team, including me, argued that a thing that turns toward you with intent could never be just an appliance. That’s why we asked ourselves whether we should shape the character or let it emerge by accident. The answer turned out to be: shape it, and do it early.
Why Alexa Alone Wasn’t Enough
Alexa on the device felt “somewhat strange and creepy,” according to the design lead. Building a unique voice for Astro was too slow and expensive in 2018, so we settled on a hybrid model: Alexa handled the spoken dialogue, while Astro communicates through sound, motion, and facial expressions. The result was a robot that could “communicate as much as it could without words,” and that distinction mattered to users.
When we ran user testing, participants didn’t identify the robot as Alexa. They said, “People didn’t see the robot as Alexa. They saw it as its own character, and that’s what they wanted it to be.” That quote sums up the core insight: people assign agency the moment a device moves and looks at them.
Building a Sound‑First Personality
My job started as “defining the robot’s sound design language and voice,” but there was no one to flesh out the actual character. Every decision—how Astro moved, how long it paused, what tone it used—became a character choice. The animators programmed motion and facial expressions, but the emotional arc they animated came from the sound work first. In practice, that meant we wrote a story for each interaction before we drew a line on the robot’s chassis.
Take Astro’s wake‑up sequence. It wasn’t a simple boot screen; it was a performance. The robot oriented itself quietly, stretched its screen, checked its wheels, then lifted its telescoping mast with a small dance of joy. Sound, motion, and eyes hit every beat together in full choreography. That moment illustrated how a robot’s personality can be expressed without a single word.
Defining an Emotional Range That Works
We kept Astro’s emotional range deliberately small. The design brief said we never wanted Astro to get too sad or too angry. It could play a sad tone, but it would snap out of it quickly and end on a high note to keep the overall experience positive. That constraint forced us to think about how to convey uncertainty without eroding trust.
One of the hardest questions was, “How does this robot communicate uncertainty without eroding trust?” The answer was to embed subtle cues—like a brief hesitation in its movement or a soft, wavering tone—that hinted at doubt but were quickly resolved. That approach let users feel the robot was thoughtful, not broken.
Character Leaks Through Every Seam
Even a tiny mismatch can make the experience feel disjointed. For example, in the “Sing” sequence, Astro would go from nothing into an emotional moment and then back to nothing with no buildup or cooldown. The lack of stitching made the clip feel like a video playing on a robot rather than an expression coming from within. I pushed hard for better transitions, but they never got implemented, leaving a noticeable gap.
- Without proper stitching, users notice inconsistencies even if they can’t name them.
- Animation timing that’s slightly off can break immersion.
- Contextually tone‑deaf responses erode trust faster than outright errors.
Practical Takeaways for Embodied AI Builders
If you’re building any embodied AI—whether it’s a home robot, a delivery drone, or a factory assistant—think of character as a design system. Start with core questions: What’s the baseline emotional state? How will the device signal uncertainty? Where’s the line between expressive and annoying? Answering those early saves you from retrofitting personality later.
Don’t assume users will ignore a robot’s personality just because you give it a functional voice. The Astro experience showed that a supporting voice (Alexa) can handle dialogue, but the robot’s own cues must carry the bulk of its character. If you rely solely on speech, you risk the “creepy” factor that many users reported.
Design Systems Over Ad‑Hoc Solutions
We treated Astro’s character as a set of reusable building blocks—a sound vocabulary, motion motifs, and facial expression templates. That system let us iterate quickly across different scenarios without reinventing the wheel each time. For developers, that means you can create a library of expressive sounds and motions that map to specific states, and then reuse them across products.
Remember that every seam matters. The transition from an expressive moment back to a neutral state should be as smooth as the moment itself. If you skip that, users will sense the robot is “put together” rather than “alive.”
Concrete Scenarios for Developers
Scenario 1: A delivery drone that drops packages at a doorstep. The drone can’t speak much, but a soft whirring tone paired with a gentle tilt of its rotors can signal confidence when it arrives, and a brief pause can hint at uncertainty if the landing spot isn’t clear. Those cues keep the user comfortable without a full‑blown voice.
Scenario 2: A factory assistant that moves tools between stations. A low‑frequency chime when it picks up a component, followed by a quick, confident glide, tells workers the robot is ready. If a sensor conflict occurs, a faint, wavering hum paired with a slight hesitation in motion communicates the problem without alarming the crew.
Scenario 3: A pet‑like companion robot for children. Bright, percussive sounds combined with head tilts convey excitement when playtime starts. When the robot needs to recharge, a slower, softer tone and a gradual lowering of its posture make the transition feel natural, rather than an abrupt shutdown.
Technical Architecture of Character Sync
Astro’s character pipeline began with a sound storyboard. Designers drafted the emotional beats for each interaction, then passed those stories to the audio team. The audio team produced layered sound assets—ambient hums, gesture clicks, and melodic motifs—that could be mixed in real time. Motion engineers received the same storyboard and built motion primitives that matched the timing of each sound layer.
Facial expression controllers received cues from both audio and motion tracks. When a sound asset indicated a “joy” cue, the eyes opened wider and the screen displayed a subtle glow. When a “uncertainty” cue played, the eyes narrowed and the mast tilted. The three tracks—audio, motion, visual—were orchestrated by a central state machine that ensured they entered and exited together. This architecture kept the robot’s personality consistent across all modalities.
Key Questions Remaining
- How will future hardware platforms support richer sound vocabularies without inflating cost?
- What metrics can reliably capture user trust when a robot displays subtle uncertainty cues?
- Can a standardized character design system be shared across different manufacturers, or will each brand develop its own personality?
- At what point does expressive behavior become intrusive, and how should developers set that boundary?
What This Means For You
As a developer, you should start your next robot project by drafting a character brief before you write any code. Define the emotional palette, decide how uncertainty will be signaled, and pick a sound motif that can stand in for speech. That brief becomes a contract across engineering, design, and sound teams, ensuring everyone’s decisions reinforce the same personality.
When you prototype, test with real users early. The Astro team learned that users instantly labeled the robot as a character, regardless of our intentions. If you let that happen, you’ll have to spend months retrofitting personality; if you own it from day one, you’ll avoid the “creepy” Alexa‑on‑wheels trap and deliver a more trusted experience.
Looking ahead, the big question is whether future embodied AI will treat character as an afterthought or as a core component. Astro proved that even a modest emotional range can make a robot feel like a companion rather than a gadget. As more homes welcome autonomous devices, the stakes for getting character right will only grow.
People didn’t see the robot as Alexa. They saw it as its own character, and that’s what they wanted it to be.
For a deeper dive into the design process, check the original report on IEEE Spectrum.
Sources: IEEE Spectrum, Wired


