Application Control via LLM Conversation: Fusing the UX/UI Boundary

🗣️ Application Control via LLM Conversation Welcome to the recap of my July 2024 presentation at the Vegas Tech Alley AI Meetup. This talk explores a different paradigm for application...

Matt Denman

July 21, 2024

3 min read

🗣️ Application Control via LLM Conversation

Welcome to the recap of my July 2024 presentation at the Vegas Tech Alley AI Meetup. This talk explores a different paradigm for application design: making the LLM conversation the primary method of control and navigation, effectively fusing the boundaries between the user interface (UI) and the AI experience.

Video: Tech Alley Vegas | Las Vegas AI Meetup - July 2024

The Application as a Graph

To enable an AI to control an application, we must first allow the AI to understand the application’s structure.

Nodes and Edges: An application can be modeled as a graph, where nodes represent views (like a dashboard, story editor, or customer list) and edges represent navigation paths.
User State: The AI needs to maintain a session and user profile detailing their current location (view) and their level of awareness (how much they know how to use the app). This determines the AI’s conversational style (e.g., holding their hand vs. checking off tasks).

The ultimate goal is for the user to navigate and perform tasks through conversation, reducing the need for them to manually click through traditional menus and features they may not even know exist (like banking features such as Zelle).

🛠️ Tool Definition: The Key to Action

The critical piece of the architecture is the Tooling System. These are discrete functions the AI can call to perform actions or gather real-time data from the application or the outside world.

Tools vs. RAG vs. Fine-Tuning

We provide data to the LLM in three distinct ways, with Tools being the most dynamic:

Method	Purpose	Mechanism
RAG (Retrieval-Augmented Generation)	Providing Knowledge (Documents, History)	Cosine similarity search against vector database (Elastic Search) to find text chunks.
Fine-Tuning	Providing Consistency (Model Behavior)	Takes weeks; generally less instantaneous control over changes.
Tools	Providing Action (Real-time Functions)	The user asks for a task (e.g., “What’s the weather?”). The LLM responds with a JSON/YAML Tool Command instead of text, which the server executes.

Tool Scoping and Control

Tool Command Output: When the LLM decides a tool is needed, it responds with a structured output (YAML/JSON) that specifies the function name and the parameters (e.g., open web tab needs a url).
Scope: Tools are scoped to the current page/view. A Global tool (like open web tab) is always available. A Story Writer tool is only available when the user is in the story editing view.
Security: Tool visibility is tied to the user’s security privilege; the AI won’t offer tasks a user isn’t capable of performing.
User Language: We make the tool definitions editable (using YAML) so that companies can modify the English description to match their specific business language and terminology, avoiding generic AI responses.

🌐 Controlling External Systems

This architecture is not limited to controlling the host application. It can control external web pages and even legacy systems.

Web Automation: The AI can control a separate web browser (via a Chrome plugin) to perform actions like navigating to YouTube Music, opening a Wikipedia page, or eventually, paying a utility bill. The AI reads the page, figures out the next logical step (e.g., “I need to log in first”), and executes it like a human.
Legacy Systems: This same conversational layer can be placed on top of ancient applications, like a 3270 terminal/Mainframe app. As long as the server can emulate the input/output of the legacy system, the AI can control it without changing a single line of code in the original application. This offers a powerful path to modernizing government or large enterprise systems.

🧪 Consistency vs. Creativity

A major concern in using LLMs for application control is consistency—especially for actions that require precision (like bank transfers).

LLM Variability: The nature of LLMs means the output is intentionally variable. In creative tasks (like story writing), this is a benefit.
Control Mechanisms: For high-stakes tasks, consistency is achieved by limiting the tools exposed to the AI and by controlling the temperature (creativity setting) of the model.
The Seed: Newer LLM APIs (like GPT-4’s preview) allow control over the seed, a random number that guides generation, which offers a path to greater consistency for critical actions.

This conversational, action-based UX paradigm is the closest step toward building a truly responsive, context-aware interface that acts as an assistant rather than just a navigation tool.