This AI Agent Picks Its Own Brain (10x Cheaper, n8n)

Video ID: gwCQF--cARA

YouTube URL: https://www.youtube.com/watch?v=gwCQF--cARA

Added At: 13-06-25 21:16:10

Processed: Yes

Sentiment: Error

Categories: Ai, Technology

Tags: agent, picks, brain, cheaper, want

Summary

Analysis error: Missing required field: categories

Transcript

If you've ever wondered which AI model
to use for your agents and you're tired
of wasting credits or overpaying for
basic tasks, then this video is for you
because today I'm going to be showing
you a system where the AI agent picks
its brain dynamically based on the task.
This is not only going to save you
money, but it's going to boost
performance and we're also getting full
visibility into the models that it's
choosing based on the input and we'll
see the output. That way, all we have to
do is come back over here, update the
prompt, and continue to optimize the
workflow over time. As you can see,
we're talking to this agent in Slack.
So, what I'm going to do is say, "Hey,
tell me a joke." You can see my failed
attempts over there. And it's going to
get this message. As you can see, it's
picking a model, and then it's going to
answer us in Slack, as well as log the
output. So, we can see we just got the
response, why don't scientists trust
atoms? Because they make up everything.
And if I go to our model log, we can see
we just got the input, we got the
output, and then we got the model which
was chosen, which in this case was
Google Gemini's 2.0 Flash. And the
reason it chose Flash is because this
was a simple input with a very simple
output, and it wanted to choose a free
model. So, we're not wasting credits for
no reason. All right, let's try
something else. I'm going to ask it to
create a calendar event at 1 p.m. today
for lunch. Once this workflow fires off,
it's going to choose the model. As you
can see, it's sending that over to the
dynamic agent to create that calendar
event. It's going to log that output and
then send us a message in Slack. So,
there we go. I just have created the
calendar event for lunch at 1 p.m.
today. If you need anything else, just
let me know. We click into the calendar
real quick. There is our launch event at
1. And if we go to our log, we can see
that this time it used OpenAI's GBT 4.1
Mini. All right, we'll just do one more
and then we'll break it down. So, I'm
going to ask it to do some research on
AI voice agents and create a blog post.
Here we go. It chose a model. It's going
to hit Tavly to do some web research.
It's going to create us a blog post, log
the output, and send it to us in Slack.
So, I'll check in when that's done. All
right, so it just finished up and as you
can see, it called the Tavly tool four
times. So, it did some in-depth
research. It logged the output and we
just got our blog back in Slack. As you
can see, wow, it is pretty thorough. It
talks about AI voice agents, the rise of
voice agents. Um, there's key trends
like emotionally intelligent
interactions, advanced NLP, real-time
multilingual support, all this kind of
stuff. Um, that's the whole blog, right?
It ends with a conclusion. And if you're
wondering what model it used for this
task, let's go look at our log. We can
see that it ended up using Claude 3.7
sonnet. And like I said, it knew it had
to do research. So, it hit the table
tool four different times. The first
time it searched for AI voice agents
trends, then it searched for case
studies, then it searched for growth
statistics, and then it searched for
ethical considerations. So, it made us a
pretty like holistic blog. Anyways, now
that you've seen a quick demo of how
this works, let's break down how I set
this up. So, the first things first,
we're talking to it in Slack and we're
getting a response back in Slack. And as
you can see, if I scroll up here, I had
a a few fails at the beginning when I
was setting up this trigger. So, if
you're trying to get it set up in Slack,
um it can be a little bit frustrating,
but I have a video right up here where I
walk through exactly how to do that.
Anyways, the key here is that we're
using Open Router as the chat model. So,
if you've never used Open Router, it's
basically a chat model that you can
connect to and it basically will let you
route to any model that you want. So, as
you can see, there's 300 plus models
that you can access through Open Router.
So, the idea here is that we have the
first agent, which is using a free model
like Gemini 2.0 Flash. we have this one
choosing which model to use based on the
input. And then whatever this model
chooses, we're using down here
dynamically for the second agent to
actually use in order to use its tools
or produce some sort of output for us.
And just so you can see what that looks
like, if I come in here, you can see
we're using a variable. But if I got rid
of that and we change this to fixed, you
can see that we have all of these models
within our open router dynamic brain to
choose from. But what we do is instead
of just choosing from one of these
models, we're basically just pulling the
output from the model selector agent
right into here. And that's the one that
it uses to process the next steps. Cool.
So let's first take a look at the model
selector. What happens in here is we're
feeding in the actual text that we sent
over in Slack. So that's pretty simple.
We're just sending over the message. And
then in the system message here, this is
where we actually can configure the
different models that the AI agent has
access to. So I said, "You're an agent
responsible for selecting the most
suitable large language model to handle
a given user request. Choose only one
model from the list below based strictly
on each model's strengths." So we told
it to analyze the request and then
return only the name of the model. We
gave it four models. Obviously, you
could give it more if you wanted to. And
down here, available models and
strengths. We gave it four models and we
basically defined what each one's good
at. You could give it more than four if
you wanted to, but just for this sake of
the demo, I only gave it four. And then
we basically said, return only one of
the following strings. And as you can
see in this example, it returned
anthropic claude 3.7 sonnet. And so one
quick thing to note here is when you use
Gemini 2.0 flash, for some reason it
likes to output uh a new line after a
lot of these
strings. So all I had to do later is I
clean up this new line and I'll show you
exactly what I mean by that. But now we
have the output of our model and then we
move on to the actual Smartyp Pants
agent. So in this one, we're giving it
the same user message as the previous
agent where we're just basically coming
to our Slack trigger and we're dragging
in the text from Slack. And what I
wanted to show you guys is that here we
have a system message and all I gave it
was the current date and time. So I
didn't tell it anything about using Tavi
for web search. I didn't tell it how to
use its calendar tools. This is just
going to show you that it's choosing a
model intelligent enough to understand
the tools that it has and how to use
them. And then of course the actual
dynamic brain part. We looked at this a
little bit, but basically all I did is I
pulled in the output of the previous
agent, the model selector agent. And
then, like I said, we had to just trim
up the end because if you just drag this
in and Open Router was trying to
reference a model that had a new line
character after it, it would basically
just fail and say this model isn't
available. So, I trimmed up the end and
that's why. And you can see in my Open
Router account, if I go to my activity,
we can see which models we've used and
how much they've costed. So, anyways,
Gemini 2.0 Flash is a free model, but if
we use it through open router, they have
to take a little bit of a, you know,
they got to get some kickback there. So,
it's not exactly free, but it's really,
really cheap. But the idea here is, you
know, Claude 3.7 sonnet is more
expensive and we don't need to use it
all the time, but if we want our agent
to have the capability of using Claude
at some point, then we probably would
just have to plug in Claude. But now, if
you use this method, if you want to talk
to the agent just about some general
things or looking up something on your
calendar or sending an email, you don't
have to use Claude and waste these
credits. You could go ahead and use a
free model like 2.0 Flash or still a
very powerful cheap model like GPT 4.1
Mini. And that's not to say that 2.0
Flash isn't super powerful. It's just
more of a lightweight model. It's very
cheap. Anyways, that's just another cool
thing about Open Router. That's why I've
gotten in the habit of using it because
we can see the tokens, the cost, and the
breakdown of different models we've
used. From there, we're feeding the
output into a Google sheet template,
which by the way, you can download this
workflow as well as these other ones
down here that we'll look at in a sec.
You can download all this for free by
joining my Free School community. All
you have to do is go to YouTube
resources or search for the title of
this video and when you click on the
post associated with this video, you'll
have the JSON which is the end workflow
to download as well as you'll see this
Google sheet template somewhere in that
post so that you can just basically copy
it over and then you can plug everything
into your environment. Anyways, just
logging the output of course and we're
sending over a timestamp. So I just said
you know whatever time this actually
runs you're going to send that over. the
input, so the Slack message that
triggered this workflow. The output, I'm
basically just bringing the output from
the Smartyp Pants agent right here. And
then the model is the output from the
model selector agent. And then all
that's left to do is send the response
back to the human in Slack where we
connected to that same channel. And
we're just sending the output from the
agent. So hopefully this is just going
to open your eyes to how you can set up
a system so that your actual main agent
is dynamically picking a brain to
optimize your cost and performance. And
in a space like AI where new models are
coming out all the time, it's important
to be able to test out different ones
for their outputs and see like what's
going on here, but also to be able to
compare them. So, two quick tools I'll
show you guys. This first one is Vellum,
which is an LLM leaderboard. You can
look at like reasoning, math, coding,
tool use. You have all this stuff. You
can compare models right here where you
can select them and look at their
differences. And then also down here is
model comparison with um all these
different statistics you can look at.
You can look at context, window, cost,
and speed. So, this is a good website to
look at, but just keep in mind it may
not always be completely up to date.
Right here, it was updated on April
17th, and today is the 30th, so doesn't
have like the 4.1 models. Anyways,
another one you could look at is this LM
Arena. So, I'll leave the link for this
one also down in the description. You
can basically compare different models
by chatting with them like side by side
or direct. People give ratings and then
you can look at the leaderboard for like
an overview or for text or for vision or
for whatever it is. just another good
tool to sort of compare some models.
Anyways, we'll just do one more quick
before we go on to the example down
below. Um because we haven't used the
reasoning model yet and those are
obviously more expensive. So, I'm asking
you a riddle. I said you have three
boxes. One has apples, one has only
oranges, and one has a mix of both.
They're all incorrectly labeled and you
can pick one fruit from the box without
looking. How can you label all boxes
correctly? So, let's see what it does.
Hopefully, it's using the reasoning
model. Okay, so it responded with a
succinct way to see it is to pick one
piece of fruit from the box labeled
apples and oranges. Since that label is
wrong, the box must actually contain
only apples or only oranges. Whatever
fruit you draw tells you which single
fruit box that really is. Once you know
which box is purely apples or purely
oranges, you can use the fact that all
labels are incorrect to deduce the
proper labels for the remaining two
boxes. And obviously, I had chatbt sort
of give me that riddle and that's
basically the answer it gave back. So,
real quick, let's go into our log and
we'll see which model it used. And it
used OpenAI's01 reasoning model. And of
course, we can just verify that by
looking right here. And we can see it is
OpenAI 01. So, one thing I wanted to
throw out there real quick is that Open
Router does have sort of like an auto
option. You can see right here, Open
Router/auto, but it's not going to give
you as much control over which models
you can choose from, and it may not be
as costefficient as being able to define
here are the four models you have, and
here's when to use each one. So, just to
show you guys like what that would do if
I said, "Hey," it's going to use its
model and it's going to pick one based
on the input. And here you can see that
it used GPT4 mini. And then if I go
ahead and send in that same riddle that
I sent in earlier, remember earlier it
chose the reasoning model, but now it's
going to choose probably not the
reasoning model. So anyways, looks like
it got the riddle right. And we can see
that the model that it chose here was
just GPT40. So I guess the argument is
yes, this is cheaper than using 01. So
if you want to just test out your
workflows by using the auto function, go
for it. But if you do want more control
over which models to use, when to use
each one, and you want to get some
higher outputs in certain scenarios,
then you want to take probably the more
custom route. Anyways, just thought I'd
drop that in there. But let's get back
to the video. All right, so now that
you've seen how this agent can choose
between all those four models, let's
look at like a different type of example
here. Okay, so down here we have a rag
agent. And this is a really good use
case in my mind because sometimes you're
going to be chatting with a knowledge
base and it could be a really simple
query like, can you just remind me what
our shipping policy is? Or something
like that. But if you wanted to have
like a comparison and like a deep lookup
for something in the knowledge base,
you'd probably want more of a, you know,
a more intelligent model. So, we're
doing a very similar thing here, right?
This agent is choosing the model with a
free model and then it's going to feed
in that selection to the dynamic brain
for the rag agent to do its lookup. And
um what I did down here is I just put a
very simple flow if you wanted to
download a file into Superbase just so
you can test out this Superbase Rag
agent up here. But let's chat with this
thing real quick. Okay, so here's my
policy and FAQ document, right? And then
I have my Superbase table where I have
these four vectors in the documents
table. So what we're going to do is
query this agent for stuff that's in
that policy and FAQ document. And we're
going to see which model it uses based
on how complex the query is. So if I go
ahead and fire off what is our shipping
policy, we'll see that the model
selector is going to choose a model,
send it over, and now the agent is
querying Superbase and it's going to
respond with here's TechHaven's shipping
policy. Orders are processed within 1 to
two days. standard shipping takes 3 to
seven business days blah blah blah. And
if we compare that with the actual
documentation, you can see that that is
exactly what it should have responded
with. And you'll also notice that in
this example, we were not logging the
outputs just because I wanted to show a
simple setup. But we can see the model
that it chose right here was GPT 4.1
mini. And if we look in this actual
agent, you can see that we only gave it
two options, which was GPT 4.1 mini and
enthropic cloud 3.5 sonnet, just because
of course I just wanted to show a simple
example. But you could up this to
multiple models if you'd like. And just
to show that this is working
dynamically, I'm going to say what's the
difference between our privacy policy
and our payment policy. And what happens
if someone wants to cancel their order
or return an item? So we'll see.
Hopefully it's choosing the cloud model
because this is a little bit more
complex. Um it just searched the vector
database. We'll see if it has to go back
again or if it's writing an answer. It
looks like it's writing an answer right
now. And we'll see if this is accurate.
So privacy versus payment. We have
privacy focuses on data protection.
payment covers accepted payment methods.
Um, what happens if someone wants to
cancel the order? We have order
cancellation can be cancelled within 12
hours. And we have a refund policy as
well. And if we go in here, we could
validate that all this information is on
here. And we can see this is how you
cancel. And then this is how you refund.
Oh yeah, right here. Visit our returns
and refund page. And we'll see what it
says is that here is our return and
refund policy. And all this information
matches exactly what it says down here.
Okay. So, those are the two flows I
wanted to share with you guys today.
Really, I just hope that this is going
to open your eyes to the fact that you
can have models be dynamic based on the
input, which really in the long run will
save you a lot of tokens for your
different chat models. If you guys are
really serious about building AI agents
with something like Nitn, then
definitely check out my paid community.
The link for that is down in the
description. We've got a great community
of members who are learning NDN,
building with it every day, and sharing
insights. And of course, we've got a
classroom section with different deep
dive topics like vector databases, APIs,
and HTTP requests, course on building
agents, and we have two new courses
launching soon. And of course, we also
do five live calls per week to make sure
you're meeting people in the community,
never getting stuck. We've got guest
speakers, Q&As's, we do coffee chats, as
well as tech support sessions. So, I'd
love to see you guys in these calls. But
that's going to do it for this video. If
you appreciated it or you learned
something new, please give it a like.
Definitely helps me out a ton. And as
always, I appreciate you guys making it
to the end of the video. See you all on
the next one.