Ambient Agents on Next.js: Seven Levers for Token Efficiency | Vercel Insights Hub

Vercel Insights Hub

Ambient Agents on Next.js: Seven Levers for Token Efficiency | Vercel Insights Hub

Transcript

Group by:

[Music] I'm Fred Patton, uh, developer advocate at Ozero and today I'm going to talk about ambient agents. Anyone familiar with them? Cool. So, we're going to uh get into that and it's going to be mostly about viability. how could we basically manage this new class of uh AI agents and less so about uh particular implementations.

So um you know as we all know if you're talking to a large language model small language model uh you're consuming tokens and you're you're paying for those tokens so you're you're burning cash and you're accumulating latency and because agents can improvise um you also have a lot of risk besides running out of money. So, we're going to look at

some ways to kind of control and manage this. So, with the co-pilots that we know and love, um, you know, we ask them to do something and just wait right away and then you're getting some response and they've done it and and that's cool. And so, basically, they help us with discrete tasks and get things done. Ambient agents are a bit different. They're proactive. They're basically always running. If we think about um ages ago

um without even calculating AI years um you know tons of companies went event driven so everyone's got their event bus their cough Kofka and all these things happening and so the deal is as Harrison Chase of uh Langchain kind of introduced ambient agents a few years ago and they've kind of starting to get more and more popular is that what you can do is that you can connect uh these agents to your event dream. And so that sounds like a great idea and that sounds like a very bad idea, right? If if we were

worried about token spin before, what happens if we if we had an AI agent responding to every event? Uh that would be a very bad idea. So we're going to dive into that. So we can look at it as like two pillars of scale. So we've got AI ops, LLM ops where, you know, we can provision, we can get really tight inference chains and and move the data along. And so we can efficiently burn tons of cash and we can efficiently uh

cause epic mistakes. So we've got that one covered. So what we also want to be able to do is uh scale well, right? We want to make sure that our objectives for the project, for the spin, for the government, for the governance are are handled. And so this is about really scaling well and and controlling our investment in these agents, setting ourselves up for success. So I'm going to talk about seven different techniques for um using

ambient agents in a more viable way so that um terrible things don't happen. uh a lot of these techniques actually come from um stream processing because again if we are actually connecting to event stream it's actually also an event processing problem. It's kind of like when we had you know these full web architecture stacks and it's actually distributed computing. If you don't look at it that way um your site's going to

have some uh fun outcomes. So here's a flow. Um it varies a lot and I wanted to keep it simple but there's a lot of as you're progressing always you know you looping back. This is basically how do we experiment we'll see once we even get ambient agents running and in production we're going to still want to keep experimenting and always kind of keeping discovery uh dealing with uh concept drift and those different things. So

48 segments (grouped from 576 original)3909 words~20 min readGrouped by 30s intervals