Fluid Compute: Vercel’s Next Step in the Evolution of Serverless? | Vercel Insights Hub

Vercel Insights Hub

Fluid Compute: Vercel’s Next Step in the Evolution of Serverless? | Vercel Insights Hub

Description

In this episode of the Modern Web Podcast, hosts Rob Ocel and Danny Thompson sit down with Mariano Cocirio, Staff Product Manager at Vercel, to discuss Fluid Compute, a new cloud computing model that blends the best of serverless scalability with traditional server efficiency. They explore the challenges of AI workloads in serverless environments, the high costs of idle time, and how Fluid Compute optimizes execution to reduce costs while maintaining performance. Mariano explains how this approach allows instances to handle multiple requests efficiently while still scaling to zero when not in use. The conversation also covers what developers need to consider when adopting this model, the impact on application architecture, and how to track efficiency gains using Vercel’s observability tools. Is Fluid Compute the next step in the evolution of serverless? Is it redefining cloud infrastructure altogether? Keypoints - Fluid Compute merges the best of servers and serverless – It combines the scalability of serverless with the efficiency and reusability of traditional servers, allowing instances to handle multiple requests while still scaling down to zero. - AI workloads struggle with traditional serverless models – Serverless is optimized for quick, stateless functions, but AI models often require long processing times, leading to high costs for idle time. Fluid Compute solves this by dynamically managing resources. - No major changes required for developers – Fluid Compute works like a standard Node or Python server, meaning developers don’t need to change their code significantly. The only consideration is handling shared global state, similar to a traditional server environment. - Significant cost savings and efficiency improvements – Vercel’s observability tools show real-time reductions in compute costs, with some early adopters seeing up to 85% savings simply by enabling Fluid Compute. Chapters 0:00 – Introduction and Guest Welcome 1:08 – What is Fluid Compute? Overview and Key Features 2:08 – Why Serverless Compute Struggles with AI Workloads 4:00 – Fluid Compute: Combining Scalability and Efficiency 6:04 – Cost Savings and Real-world Impact of Fluid Compute 8:12 – Developer Experience and Implementation Considerations 10:26 – Managing Global State and Concurrency in Fluid Compute 13:09 – Observability Tools for Performance and Cost Monitoring 20:01 – Long-running Instances and Post-operation Execution 24:02 – Evolution of Compute Models: From Servers to Fluid Compute 29:08 – The Future of Fluid Compute and Web Development 30:15 – How to Enable Fluid Compute on Vercel 32:04 – Closing Remarks and Guest Social Media Info Follow Mariano Cocirio on Social Media: Twitter: https://x.com/mcocirio Linkedin: https://www.linkedin.com/in/mcocirio/ Sponsored by This Dot: thisdot.co

Transcript

Group by:

we like the well-known good parts of sess we are going broke with this because our requests are taking longer and longer and longer because the models are reasoning more and more and more think about some startups who are building I don't know AI to generate music to generate videos it takes a long time we are talking about minutes waiting and if you are not able to reutilize that then you are a bit Doom

fa hello everybody and welcome to the modern web podcast I'm your host Rob oel the VP of innovation at this. laabs joined today by my wonderful co-host Danny Thompson Danny is the director of Technology at this. Labs Danny how are you doing hey hey I'm great I'm excited to be here and to be honest really excited about today's topic that's right we have a really exciting topic because we are sitting down and talking with Mariano cerio who is the staff product manager at versel marianao how are you doing I'm doing fine thank you f

yeah and today we're going to be talking about fluid compute and I guess the first question is what is that so do you want to introduce us to what you guys just announced and just released yes flu compute is a new cloud computing model it's built on severals and it combines event driven execution and intelligent Resource Management it's basically would you like from servers would you like from serverless make it together you get fluid so as of the time of recording this this just came out like this is big news um

the announcements were fantastic and the blog article about it I've definitely been reading up on it um one point that it seems to be echoed is like serverless for AI is kind of broken and this is basically a solution in some way shape or form in order to handle that and so I guess if we kind of go into this what are like the main challenges with traditional serverless Computing why did we see this as being the solution for it especially like regards to Ai and what

even was the thought process to wanting to solve this and especially why is this the solution that we want to solve it with yeah I think you said right serus compute for AI is broken we need to accept that fact why is it because we have been optimizing sever lless to handle easy request like requests are going to be fast asking something to a database doing something and returning it meanwhile with AI we some sometimes s wanted to spend a little more time thinking reasoning agents talking to

each other and what's the problem with selfless you are paying for that idle time while you are not using the CPU you are going to have instances that are going to be just e waiting for response from I don't know open AI you call it Cloud whatever API you're using your own models hosted by your backhand and that time you're paying it and you don't want to be doing that meanwhile we when we

were in the past on servers what you would do is you would have your server you will hit just you know request on it and it will start queuing it and will start you know executing like one request executes if I have idle time then another request executes right after that one Etc so during the idle time of request one you can execute request two what fluid is bringing you is that have being being able to have

one single instance handling multiple request but at the same time those instances can scale down to zero and can scale up to infinite like they do on the sess model so you get the infinite scaling from serverless you get the scale down to Zero from serverless but you also get the reability of the instances and the efficiency of the servers it's the best of both worlds

66 segments (grouped from 316 original)6093 words~30 min readGrouped by 30s intervals