Claude Code Can Now Control Your Browser (Thanks to Vercel) | Vercel Insights Hub

Vercel Insights Hub

Claude Code Can Now Control Your Browser (Thanks to Vercel) | Vercel Insights Hub

Transcript

Group by:

This is agents browser, an open- source headless browser CLI built in a weekend by a single VEL employee that lets your agent do anything in the browser from dragging and dropping to uploading an image and even toggling offline mode. But why would anyone use this over something like a browser use which has way more features? And is Versel getting into the agent browser space? Hit subscribe and let's get into it. 2026 is the year of AI agents writing, reviewing, and testing all of your code. No more tab completions. In fact, developers are even moving away from

idees entirely in favor of doing everything in the terminal since all we're really doing now is reviewing code. And to help with this movement, we need the agents to actually interact with and test the code they've written. Because the last thing you want to do as a developer is to open up the browser to test each feature an army of agents have written one by one because that's just tedious. This is where Vel's new agent browser comes in handy. Written by Chris Tate in both Rust and Typescript. I'll explain why later. This tool makes it so

easy for an agent to interact with the browser using CLI commands that do a bunch of things like creating an accessibility snapshot that provides an accessibility tree and references to elements of a page, reference based actions that takes the references from the tree and apply relevant actions to them. There's also semantic locators if you don't want to use references that allow you to find an element based on its area ro its text content its label and so much more. In fact, let's go through a quick demo of how it works.

Now, here is a little login page with an email and password. And it's built with Shadien React MV not because of a cell or anything. It just happened to be built that way. Now, there's one problem with this whole page and it's that right now I'm blinding my users because it's in light mode. So I want there to be a dark mode which I've actually gone ahead and asked the agent to do but as you can see it hasn't done it correctly. I mean okay this text changes but nothing else. So let's go ahead and get the agent to fix this using agent browser. So right

now I'm using open code with the GLM 4.7 model but of course agent browser can work with any agent and any model. I've gone ahead and told it that dark mode is broken and it should test it with agent hyphen browser on the specific port. What's important is this part of the command to run agent hyphen browser-help to see the available commands because there's no slash commands, no skills. I've just installed agent browser globally with npm. I'm going to hit enter and then it checks the available commands, uses the agent browser

snapshot functionality to create a snapshot of the page which shows documents, heading, paragraph, text, and even images. It's then clicked on the relevant element and taken a screenshot to see if dark mode is working. And this is the screenshot if you're curious. From here, it's gone ahead and fixed the issue before taking another screenshot of the fixed dark mode. And it's finally finished the task, which we can test by clicking up here. And we have a page with perfect dark mode. Let's try another test. Actually, while this was running, I had another agent in the background fix another issue. You may have noticed that if I press the login

button, it will take me straight here without any validation, which of course isn't good. So I went ahead and asked it to fix the issue with the validation in this project and it did something actually really interesting. It first checks the available commands from Asian browser and then if we scroll down it fixes the issue and even makes a bash script. So it's over here to test that it works. So it echoes the first test, adds an empty input, clicks the login button and then expects these errors.

It's made a few tests here, but it's actually made an even better bash test down below, which we can see over here that makes use of agent browser eval to run some JavaScript code. So now we can see if I press the login button, we get some validation. This looks like an email, but it's actually a placeholder. If I give it an email over here, just a made up one, and hit login, it says enter a valid email, which I can do like this. And then I can enter a password before it takes me to this dashboard. So, basically, agents have addressed two issues with this app and tested it themselves to validate it works using

14 segments (grouped from 195 original)1370 words~7 min readGrouped by 30s intervals