0→1 design of a real-time voice assistant to enable productivity on the go
June 2025 - September 2025
In July 2025, I led an initiative at Glean to explore how voice and audio could expand the value of our workplace intelligence platform. The goal was to design a real-time voice assistant for Glean’s mobile app that reduces friction, improves accessibility, and unlocks new productivity moments for knowledge workers.
By using spoken dialogue systems and large language models (LLMs), the assistant enables users to search, brainstorm, and act on information naturally—whether they’re commuting, multitasking, or simply looking for a faster, more intuitive way to engage with enterprise knowledge.
I led this project independently as part of my summer internship assignment.
I was responsible for the end-to-end design, from defining user flows and system logic for multimodal interactions (voice + visual) to wireframing and high-fidelity prototyping. Throughout the process, I collaborated with engineers to scope system design and ensure feasibility, using the prototype to drive alignment and spark momentum for implementation.

The Starting Point
The modern workday is fluid. People are working on the way to meetings, while commuting, or even while getting ready in the morning. In these moments, they’re still thinking, planning, and making decisions — but our tools don’t always meet them there.
This gap revealed a big opportunity for Glean: to extend its value beyond the desk and unlock more moments throughout the day where users can work however and whenever they want. By looking at how people naturally ideate and consume information, I found that voice could play a critical role in enabling people to do work in hands-busy or on-the-go contexts.
Many users and Glean employees had already expressed interest in a real-time voice assistant, but it had never been prioritized against other product initiatives. For my summer internship, I was tasked with researching its potential, identifying where it could have the most impact, and designing a high-fidelity prototype to show how voice could fit seamlessly into Glean’s product. The goal was to create a proof-of-concept that demonstrated how a voice assistant could unlock new productivity moments and move Glean closer to its vision of being a true work companion.
I first set out to understand how our users move through their workdays. I interviewed product marketing managers, sales executives, account executives, engineers, and designers to capture a diverse range of workflows. I also drew from user interviews and feedback within Glean to map patterns in how people actually work and where current tools fall short.
One core theme I identified was that people are constantly transitioning between contexts while commuting, walking to meetings, multitasking at home, or switching between apps. In these moments, Glean’s text-heavy interfaces fall short: typing is slow, disruptive to exploratory thinking, and often impractical.
I mapped out specific use cases across different roles such as sales reps, executives, and product managers, who often need to get work done on the go. These scenarios showed how, even though people are thinking and making decisions throughout the day, our current technology doesn’t support them in executing these tasks outside of a desk environment.
Synthesizing the insights from my research phase led me to define a central question that would guide my explorations and establish a focus for the rest of my work:
I chose to ground this concept in a mobile-first experience because mobile devices are always with us, from the moment we wake up to the end of the day. They support lightweight, hands-free interactions that fit into in-between moments like commuting, cooking, or walking, making it the most natural surface to iterate on first.
With the problem space defined, the next step was to explore how we might solve it. To kick this off, I facilitated a design jam with members of my team to have a focused session for brainstorming openly and sketching out early concepts. The goal wasn’t to land on final solutions, but to surface common themes and directions that could guide the next stage of ideation. This exercise helped align the team, gave me a clearer sense of where to start, and set the foundation for the concepts I would take forward into prototypes.
Identifying a Direction
From the design jam, a few central ideas emerged:
Live brainstorming and chatting — using voice to think out loud and refine ideas in real time.
Listening to information in a podcast form — transforming documents and updates into an audio-first format.
Role-based coaching — interacting with the assistant as if it were a sales coach, product manager, or other persona.
Given the scope and time constraints of the project, I decided to focus on the first concept, live brainstorming and chatting, as it felt both high-impact and directly addressed user needs around fast, on-the-go ideation and exploratory thinking. The next step was mapping out user journeys and system architectures to define how a knowledge worker might interact with the assistant, where they enter voice mode, how the conversation flows, and what outputs (like summaries or action items) they receive at the end.
Early Explorations
With the direction defined, I began sketching different approaches for how a real-time voice assistant could live within Glean’s mobile app. I considered multiple aspects of the experience:
Entry points — where and how users would activate voice mode, whether through a dedicated button, gesture, or contextual prompt.
Transitions — how users would move seamlessly between voice mode and the existing assistant view, without breaking flow.
Interface layout — how the voice interface could mirror elements of Glean’s current mobile design while adapting to conversational interactions.
I also explored home page layouts in the mobile app, ranging from minimalist to more information-dense designs. In these variations, the home page not only served as a central entry point into the voice experience but also acted as a user’s first stop of the day.
Designing the Flow
The next challenge was figuring out how the conversation itself should work. I created some flows to answer key questions: How should a conversation start? What should the screen show while the assistant is listening or responding? How much text should appear—and how dense should it be—before the experience feels overwhelming?
I explored variations in the level of visual artifacts shown on screen, from lightweight indicators for quick back-and-forth chats to denser transcript-style views for longer exchanges. Balancing these elements was important: too little context left users unsure of what was happening, while too much text made the interaction feel cluttered. These explorations helped establish a flow that was clear, usable, and aligned with Glean’s role as a knowledge partner.
I explored three different visual approaches for how a real-time voice assistant could live inside Glean’s mobile app:
Voice as a separate mode with a splash screen — users enter a distinct voice experience before transitioning into a chat-style UI.
Voice as a separate mode with text toggle — users speak to the assistant but can choose to toggle on Glean’s text responses through a 'CC' button, with large, easy-to-read captions.
Voice as a layer on top of text — voice interactions appear directly over the existing chat UI, blending both modalities together.
I shared these options with my design team during critique to see which direction resonated most. Feedback leaned toward the second option, which emphasized a focused avatar as the central point of interaction, paired with the ability to switch into a captions view with large text when needed. This approach offered the right balance of simplicity for voice-first use and flexibility for hands-free contexts.
Avatar Design
After aligning on option 2, I shifted focus to refining the avatar—the key visual element that would anchor the voice experience. Having a visual avatar for a voice assistant helps ground the interaction, making it easier for users to understand its state, follow the flow of conversation, and feel more connected to the experience.
The spectrum below shows a range of avatar styles for voice assistants, and for Glean, a direction closest to fluid abstract felt most appropriate as it still maintains professionalism while avoiding the stiffness of pure abstraction or the informality of cartoon characters. Since Glean's product is designed for work-related interactions and is less of a product you talk to for fun, the avatar should be simple and non-distracting but still expressive enough to be engaging to interact with.
At the same time, I wanted to explore how the avatar could feel more uniquely Glean. I reached out to our Director of Creative Design and collaborated with his team to see how we might bring the “Glean Glimmer” and sense of magic from our branding into the assistant. Together, we sketched several explorations that infused subtle brand cues into the avatar while keeping it simple and professional. While the final direction is still being refined, these early explorations gave us promising paths to lean into and helped frame the avatar as not just functional, but a distinct extension of the Glean brand.
System Design
Beyond the UI, I also needed to consider the underlying system design that would make the experience possible. Defining how the feature should function at a technical level helped ground the interaction model and ensure the design was feasible. The first concept I sought to understand was a chained architecture, or the end-to-end flow of a input and output processing.
At a high level, it processes audio step-by-step: first, the user’s speech is converted to text by a speech-to-text model. Then, the text is passed to an intelligent agent — a large language model (LLM) — which generates a response in text form. Finally, that response is converted back into speech using a text-to-speech model, producing the audio reply you hear.
One critical element here is the prompt design. For every voice input, a specific prompt is appended to it. This prompt acts like a guide for the system, telling the AI how to interpret the user’s intent and generate a meaningful answer that fits the tone and purpose of the conversation.
To prototype and refine the conversational experience, I experimented with OpenAI’s real-time voice platform. This allowed me to test how different system prompts shaped the assistant’s responses in terms of tone, quality, and length. Beyond prompts, I also explored controls such as customizing the voice style, adjusting speaking speed, and enabling automatic turn detection to make the interaction feel smoother and more natural.
The image on the right shows the prompt I created to guide the assistant’s tone and flow of conversation. A few key points are to keep responses concise (no more than two sentences), acknowledge the user’s query with a brief phrase like “Got it!”, and ask clarifying questions when the query is vague.
I demoed a live conversation at my intern presentation, walking through a classic brainstorming experience where I interrupted the assistant when I wanted, asked follow-up questions, and dove deeper into topics. It was clear that once people could feel how the interaction works, the value of the design clicked and there was tangible excitement around the potential.
Interaction States
With the system architecture defined, the next step was to translate those mechanics into the user experience. To do this, I modeled the core interaction states—connecting, idle, listening, processing, and responding—so users could clearly see what the system was doing at each stage. With the help of Midjourney, an image and video generation platform, I created short videos to visualize each state in action. While these cues provide transparency for users, they also support backend design by clarifying the functions and transitions behind each state.
The Core Experience
With the interaction states defined, I moved on to building the core prototype that demonstrated the real-time back-and-forth of the voice assistant. This prototype focused on the foundational experience—a simple conversation loop where the user speaks, the system listens and processes, and then responds. This is a demo of a conversation that I presented in my final presentation, walking the audience through how the assistant could feel natural, responsive, and engaging in practice, while showcasing how the states come together in practice.
With the basics in place, I wanted to explore a larger question: How can we make this a uniquely Glean experience? The experience I explored so far resembles other voice assistants in the market, but Glean has a unique advantage: access to a company’s enterprise knowledge, a rich knowledge graph, and deep integrations with tools like Jira and Slack.
This opens an opportunity to differentiate Glean by building on these strengths and creating a voice assistant that doesn’t just answer general queries, but understands your projects, your workflows, and the tasks you need to get done. In doing so, Glean can move closer to its vision of being a true work assistant, one that feels indispensable in the flow of everyday work.
By building on this foundation and leveraging features already in development, I designed assistive layers that expand the core experience and define a unique value proposition for Glean's voice feature. Together, they position Glean’s voice assistant as more than a generic interaction, and highlight how Glean can leverage its enterprise knowledge graph and integrations to deliver a voice experience that’s deeply embedded in work, setting it apart from competitors.
Smart Suggestions
This layer surfaces context-aware prompts with relevant questions and topics based on a user’s role, projects, and schedule. It helps users get started quickly, discover what voice mode can do, and access commands tailored to their current work—leveraging Glean’s deep knowledge of workflows.
Summarize the Conversation
This layer detects when a conversation is wrapping up and prompts the user to create an end product—such as a summary of key points or a to-do list. It helps users stay organized by turning discussions into clear, actionable next steps.
Agentic Looping
Agentic looping, a feature planned for Glean’s assistant, offers an opportunity to bring advanced reasoning into the voice experience. It operates in two modes—default and advanced—giving users the option to loop for more in-depth reasoning when handling complex queries. This flexibility makes conversations feel more fluid while providing transparency into how the system thinks.
The Bigger Picture
The value of this experience is that it positions Glean's assistant as a trusted collaborator—one that understands your working style, anticipates your needs, and is deeply connected to your projects and tools. With a personalized voice and visual experience, Glean can move beyond a search tool to becoming a true work companion—helping people stay productive and engaged whenever and wherever they need it.























