Video ID: gwCQF--cARA
YouTube URL: https://www.youtube.com/watch?v=gwCQF--cARA
Added At: 13-06-25 21:16:10
Processed: No
Sentiment: Error
Categories: Ai, Technology
Tags: agent, picks, brain, cheaper, want
Summary
Analysis error: Missing required field: categories
Transcript
If you've ever wondered which AI model to use for your agents and you're tired of wasting credits or overpaying for basic tasks, then this video is for you because today I'm going to be showing you a system where the AI agent picks its brain dynamically based on the task. This is not only going to save you money, but it's going to boost performance and we're also getting full visibility into the models that it's choosing based on the input and we'll see the output. That way, all we have to do is come back over here, update the prompt, and continue to optimize the workflow over time. As you can see, we're talking to this agent in Slack. So, what I'm going to do is say, "Hey, tell me a joke." You can see my failed attempts over there. And it's going to get this message. As you can see, it's picking a model, and then it's going to answer us in Slack, as well as log the output. So, we can see we just got the response, why don't scientists trust atoms? Because they make up everything. And if I go to our model log, we can see we just got the input, we got the output, and then we got the model which was chosen, which in this case was Google Gemini's 2.0 Flash. And the reason it chose Flash is because this was a simple input with a very simple output, and it wanted to choose a free model. So, we're not wasting credits for no reason. All right, let's try something else. I'm going to ask it to create a calendar event at 1 p.m. today for lunch. Once this workflow fires off, it's going to choose the model. As you can see, it's sending that over to the dynamic agent to create that calendar event. It's going to log that output and then send us a message in Slack. So, there we go. I just have created the calendar event for lunch at 1 p.m. today. If you need anything else, just let me know. We click into the calendar real quick. There is our launch event at 1. And if we go to our log, we can see that this time it used OpenAI's GBT 4.1 Mini. All right, we'll just do one more and then we'll break it down. So, I'm going to ask it to do some research on AI voice agents and create a blog post. Here we go. It chose a model. It's going to hit Tavly to do some web research. It's going to create us a blog post, log the output, and send it to us in Slack. So, I'll check in when that's done. All right, so it just finished up and as you can see, it called the Tavly tool four times. So, it did some in-depth research. It logged the output and we just got our blog back in Slack. As you can see, wow, it is pretty thorough. It talks about AI voice agents, the rise of voice agents. Um, there's key trends like emotionally intelligent interactions, advanced NLP, real-time multilingual support, all this kind of stuff. Um, that's the whole blog, right? It ends with a conclusion. And if you're wondering what model it used for this task, let's go look at our log. We can see that it ended up using Claude 3.7 sonnet. And like I said, it knew it had to do research. So, it hit the table tool four different times. The first time it searched for AI voice agents trends, then it searched for case studies, then it searched for growth statistics, and then it searched for ethical considerations. So, it made us a pretty like holistic blog. Anyways, now that you've seen a quick demo of how this works, let's break down how I set this up. So, the first things first, we're talking to it in Slack and we're getting a response back in Slack. And as you can see, if I scroll up here, I had a a few fails at the beginning when I was setting up this trigger. So, if you're trying to get it set up in Slack, um it can be a little bit frustrating, but I have a video right up here where I walk through exactly how to do that. Anyways, the key here is that we're using Open Router as the chat model. So, if you've never used Open Router, it's basically a chat model that you can connect to and it basically will let you route to any model that you want. So, as you can see, there's 300 plus models that you can access through Open Router. So, the idea here is that we have the first agent, which is using a free model like Gemini 2.0 Flash. we have this one choosing which model to use based on the input. And then whatever this model chooses, we're using down here dynamically for the second agent to actually use in order to use its tools or produce some sort of output for us. And just so you can see what that looks like, if I come in here, you can see we're using a variable. But if I got rid of that and we change this to fixed, you can see that we have all of these models within our open router dynamic brain to choose from. But what we do is instead of just choosing from one of these models, we're basically just pulling the output from the model selector agent right into here. And that's the one that it uses to process the next steps. Cool. So let's first take a look at the model selector. What happens in here is we're feeding in the actual text that we sent over in Slack. So that's pretty simple. We're just sending over the message. And then in the system message here, this is where we actually can configure the different models that the AI agent has access to. So I said, "You're an agent responsible for selecting the most suitable large language model to handle a given user request. Choose only one model from the list below based strictly on each model's strengths." So we told it to analyze the request and then return only the name of the model. We gave it four models. Obviously, you could give it more if you wanted to. And down here, available models and strengths. We gave it four models and we basically defined what each one's good at. You could give it more than four if you wanted to, but just for this sake of the demo, I only gave it four. And then we basically said, return only one of the following strings. And as you can see in this example, it returned anthropic claude 3.7 sonnet. And so one quick thing to note here is when you use Gemini 2.0 flash, for some reason it likes to output uh a new line after a lot of these strings. So all I had to do later is I clean up this new line and I'll show you exactly what I mean by that. But now we have the output of our model and then we move on to the actual Smartyp Pants agent. So in this one, we're giving it the same user message as the previous agent where we're just basically coming to our Slack trigger and we're dragging in the text from Slack. And what I wanted to show you guys is that here we have a system message and all I gave it was the current date and time. So I didn't tell it anything about using Tavi for web search. I didn't tell it how to use its calendar tools. This is just going to show you that it's choosing a model intelligent enough to understand the tools that it has and how to use them. And then of course the actual dynamic brain part. We looked at this a little bit, but basically all I did is I pulled in the output of the previous agent, the model selector agent. And then, like I said, we had to just trim up the end because if you just drag this in and Open Router was trying to reference a model that had a new line character after it, it would basically just fail and say this model isn't available. So, I trimmed up the end and that's why. And you can see in my Open Router account, if I go to my activity, we can see which models we've used and how much they've costed. So, anyways, Gemini 2.0 Flash is a free model, but if we use it through open router, they have to take a little bit of a, you know, they got to get some kickback there. So, it's not exactly free, but it's really, really cheap. But the idea here is, you know, Claude 3.7 sonnet is more expensive and we don't need to use it all the time, but if we want our agent to have the capability of using Claude at some point, then we probably would just have to plug in Claude. But now, if you use this method, if you want to talk to the agent just about some general things or looking up something on your calendar or sending an email, you don't have to use Claude and waste these credits. You could go ahead and use a free model like 2.0 Flash or still a very powerful cheap model like GPT 4.1 Mini. And that's not to say that 2.0 Flash isn't super powerful. It's just more of a lightweight model. It's very cheap. Anyways, that's just another cool thing about Open Router. That's why I've gotten in the habit of using it because we can see the tokens, the cost, and the breakdown of different models we've used. From there, we're feeding the output into a Google sheet template, which by the way, you can download this workflow as well as these other ones down here that we'll look at in a sec. You can download all this for free by joining my Free School community. All you have to do is go to YouTube resources or search for the title of this video and when you click on the post associated with this video, you'll have the JSON which is the end workflow to download as well as you'll see this Google sheet template somewhere in that post so that you can just basically copy it over and then you can plug everything into your environment. Anyways, just logging the output of course and we're sending over a timestamp. So I just said you know whatever time this actually runs you're going to send that over. the input, so the Slack message that triggered this workflow. The output, I'm basically just bringing the output from the Smartyp Pants agent right here. And then the model is the output from the model selector agent. And then all that's left to do is send the response back to the human in Slack where we connected to that same channel. And we're just sending the output from the agent. So hopefully this is just going to open your eyes to how you can set up a system so that your actual main agent is dynamically picking a brain to optimize your cost and performance. And in a space like AI where new models are coming out all the time, it's important to be able to test out different ones for their outputs and see like what's going on here, but also to be able to compare them. So, two quick tools I'll show you guys. This first one is Vellum, which is an LLM leaderboard. You can look at like reasoning, math, coding, tool use. You have all this stuff. You can compare models right here where you can select them and look at their differences. And then also down here is model comparison with um all these different statistics you can look at. You can look at context, window, cost, and speed. So, this is a good website to look at, but just keep in mind it may not always be completely up to date. Right here, it was updated on April 17th, and today is the 30th, so doesn't have like the 4.1 models. Anyways, another one you could look at is this LM Arena. So, I'll leave the link for this one also down in the description. You can basically compare different models by chatting with them like side by side or direct. People give ratings and then you can look at the leaderboard for like an overview or for text or for vision or for whatever it is. just another good tool to sort of compare some models. Anyways, we'll just do one more quick before we go on to the example down below. Um because we haven't used the reasoning model yet and those are obviously more expensive. So, I'm asking you a riddle. I said you have three boxes. One has apples, one has only oranges, and one has a mix of both. They're all incorrectly labeled and you can pick one fruit from the box without looking. How can you label all boxes correctly? So, let's see what it does. Hopefully, it's using the reasoning model. Okay, so it responded with a succinct way to see it is to pick one piece of fruit from the box labeled apples and oranges. Since that label is wrong, the box must actually contain only apples or only oranges. Whatever fruit you draw tells you which single fruit box that really is. Once you know which box is purely apples or purely oranges, you can use the fact that all labels are incorrect to deduce the proper labels for the remaining two boxes. And obviously, I had chatbt sort of give me that riddle and that's basically the answer it gave back. So, real quick, let's go into our log and we'll see which model it used. And it used OpenAI's01 reasoning model. And of course, we can just verify that by looking right here. And we can see it is OpenAI 01. So, one thing I wanted to throw out there real quick is that Open Router does have sort of like an auto option. You can see right here, Open Router/auto, but it's not going to give you as much control over which models you can choose from, and it may not be as costefficient as being able to define here are the four models you have, and here's when to use each one. So, just to show you guys like what that would do if I said, "Hey," it's going to use its model and it's going to pick one based on the input. And here you can see that it used GPT4 mini. And then if I go ahead and send in that same riddle that I sent in earlier, remember earlier it chose the reasoning model, but now it's going to choose probably not the reasoning model. So anyways, looks like it got the riddle right. And we can see that the model that it chose here was just GPT40. So I guess the argument is yes, this is cheaper than using 01. So if you want to just test out your workflows by using the auto function, go for it. But if you do want more control over which models to use, when to use each one, and you want to get some higher outputs in certain scenarios, then you want to take probably the more custom route. Anyways, just thought I'd drop that in there. But let's get back to the video. All right, so now that you've seen how this agent can choose between all those four models, let's look at like a different type of example here. Okay, so down here we have a rag agent. And this is a really good use case in my mind because sometimes you're going to be chatting with a knowledge base and it could be a really simple query like, can you just remind me what our shipping policy is? Or something like that. But if you wanted to have like a comparison and like a deep lookup for something in the knowledge base, you'd probably want more of a, you know, a more intelligent model. So, we're doing a very similar thing here, right? This agent is choosing the model with a free model and then it's going to feed in that selection to the dynamic brain for the rag agent to do its lookup. And um what I did down here is I just put a very simple flow if you wanted to download a file into Superbase just so you can test out this Superbase Rag agent up here. But let's chat with this thing real quick. Okay, so here's my policy and FAQ document, right? And then I have my Superbase table where I have these four vectors in the documents table. So what we're going to do is query this agent for stuff that's in that policy and FAQ document. And we're going to see which model it uses based on how complex the query is. So if I go ahead and fire off what is our shipping policy, we'll see that the model selector is going to choose a model, send it over, and now the agent is querying Superbase and it's going to respond with here's TechHaven's shipping policy. Orders are processed within 1 to two days. standard shipping takes 3 to seven business days blah blah blah. And if we compare that with the actual documentation, you can see that that is exactly what it should have responded with. And you'll also notice that in this example, we were not logging the outputs just because I wanted to show a simple setup. But we can see the model that it chose right here was GPT 4.1 mini. And if we look in this actual agent, you can see that we only gave it two options, which was GPT 4.1 mini and enthropic cloud 3.5 sonnet, just because of course I just wanted to show a simple example. But you could up this to multiple models if you'd like. And just to show that this is working dynamically, I'm going to say what's the difference between our privacy policy and our payment policy. And what happens if someone wants to cancel their order or return an item? So we'll see. Hopefully it's choosing the cloud model because this is a little bit more complex. Um it just searched the vector database. We'll see if it has to go back again or if it's writing an answer. It looks like it's writing an answer right now. And we'll see if this is accurate. So privacy versus payment. We have privacy focuses on data protection. payment covers accepted payment methods. Um, what happens if someone wants to cancel the order? We have order cancellation can be cancelled within 12 hours. And we have a refund policy as well. And if we go in here, we could validate that all this information is on here. And we can see this is how you cancel. And then this is how you refund. Oh yeah, right here. Visit our returns and refund page. And we'll see what it says is that here is our return and refund policy. And all this information matches exactly what it says down here. Okay. So, those are the two flows I wanted to share with you guys today. Really, I just hope that this is going to open your eyes to the fact that you can have models be dynamic based on the input, which really in the long run will save you a lot of tokens for your different chat models. If you guys are really serious about building AI agents with something like Nitn, then definitely check out my paid community. The link for that is down in the description. We've got a great community of members who are learning NDN, building with it every day, and sharing insights. And of course, we've got a classroom section with different deep dive topics like vector databases, APIs, and HTTP requests, course on building agents, and we have two new courses launching soon. And of course, we also do five live calls per week to make sure you're meeting people in the community, never getting stuck. We've got guest speakers, Q&As's, we do coffee chats, as well as tech support sessions. So, I'd love to see you guys in these calls. But that's going to do it for this video. If you appreciated it or you learned something new, please give it a like. Definitely helps me out a ton. And as always, I appreciate you guys making it to the end of the video. See you all on the next one.