Application Control via LLM Conversation: Fusing the UX/UI Boundary
š£ļø Application Control via LLM Conversation Welcome to the recap of my July 2024 presentation at the Vegas Tech Alley AI Meetup. This talk explores a different paradigm for application...
š£ļø Application Control via LLM Conversation
Welcome to the recap of my July 2024 presentation at the Vegas Tech Alley AI Meetup. This talk explores a different paradigm for application design: making the LLM conversation the primary method of control and navigation, effectively fusing the boundaries between the user interface (UI) and the AI experience.
Video: Tech Alley Vegas | Las Vegas AI Meetup - July 2024
The Application as a Graph
To enable an AI to control an application, we must first allow the AI to understand the applicationās structure.
- Nodes and Edges: An application can be modeled as a graph, where nodes represent views (like a dashboard, story editor, or customer list) and edges represent navigation paths.
- User State: The AI needs to maintain a session and user profile detailing their current location (view) and their level of awareness (how much they know how to use the app). This determines the AIās conversational style (e.g., holding their hand vs. checking off tasks).
The ultimate goal is for the user to navigate and perform tasks through conversation, reducing the need for them to manually click through traditional menus and features they may not even know exist (like banking features such as Zelle).
š ļø Tool Definition: The Key to Action
The critical piece of the architecture is the Tooling System. These are discrete functions the AI can call to perform actions or gather real-time data from the application or the outside world.
Tools vs. RAG vs. Fine-Tuning
We provide data to the LLM in three distinct ways, with Tools being the most dynamic:
| Method | Purpose | Mechanism |
|---|---|---|
| RAG (Retrieval-Augmented Generation) | Providing Knowledge (Documents, History) | Cosine similarity search against vector database (Elastic Search) to find text chunks. |
| Fine-Tuning | Providing Consistency (Model Behavior) | Takes weeks; generally less instantaneous control over changes. |
| Tools | Providing Action (Real-time Functions) | The user asks for a task (e.g., āWhatās the weather?ā). The LLM responds with a JSON/YAML Tool Command instead of text, which the server executes. |
Tool Scoping and Control
- Tool Command Output: When the LLM decides a tool is needed, it responds with a structured output (YAML/JSON) that specifies the function name and the parameters (e.g.,
open web tabneeds aurl). - Scope: Tools are scoped to the current page/view. A Global tool (like
open web tab) is always available. A Story Writer tool is only available when the user is in the story editing view. - Security: Tool visibility is tied to the userās security privilege; the AI wonāt offer tasks a user isnāt capable of performing.
- User Language: We make the tool definitions editable (using YAML) so that companies can modify the English description to match their specific business language and terminology, avoiding generic AI responses.
š Controlling External Systems
This architecture is not limited to controlling the host application. It can control external web pages and even legacy systems.
- Web Automation: The AI can control a separate web browser (via a Chrome plugin) to perform actions like navigating to YouTube Music, opening a Wikipedia page, or eventually, paying a utility bill. The AI reads the page, figures out the next logical step (e.g., āI need to log in firstā), and executes it like a human.
- Legacy Systems: This same conversational layer can be placed on top of ancient applications, like a 3270 terminal/Mainframe app. As long as the server can emulate the input/output of the legacy system, the AI can control it without changing a single line of code in the original application. This offers a powerful path to modernizing government or large enterprise systems.
š§Ŗ Consistency vs. Creativity
A major concern in using LLMs for application control is consistencyāespecially for actions that require precision (like bank transfers).
- LLM Variability: The nature of LLMs means the output is intentionally variable. In creative tasks (like story writing), this is a benefit.
- Control Mechanisms: For high-stakes tasks, consistency is achieved by limiting the tools exposed to the AI and by controlling the temperature (creativity setting) of the model.
- The Seed: Newer LLM APIs (like GPT-4ās preview) allow control over the seed, a random number that guides generation, which offers a path to greater consistency for critical actions.
This conversational, action-based UX paradigm is the closest step toward building a truly responsive, context-aware interface that acts as an assistant rather than just a navigation tool.