I’m changing how I use AI (Open WebUI + LiteLLM)

Video ID: nQCOTzS5oU0

YouTube URL: https://www.youtube.com/watch?v=nQCOTzS5oU0

Added At: 13-06-25 21:16:41

Processed: No

Sentiment: Positive

Categories: Education, Tech

Tags: AI, Open Web UI, Llama 3, Chat GPT, Claude, Cloud Computing, APIs, Virtual Private Server

Summary

The video creator shows how to access various AI models, including chat GPT and Claude, without paying for plans. They use Open Web UI, an open-source web interface for AI, which allows them to run self-hosted models like Llama 3. The video also covers setting up a virtual private server (VPS) on the cloud provider Hosting.com/network., and discusses APIs as a way to access AI models without paying full prices. Overall, the video is an educational resource for those interested in exploring AI technology.

Transcript

I found a way to access every
ai. I'm talking chat, GBT,
Claude Gemini Grok from one
self-hosted interface, and no,
I'm not paying for any of these
plans. Get out of here yet.
I have unlimited usage and I get access
to the newest models as soon as they
come out. No more waiting.
And the best part is that
all my people get to use it.
I can create accounts for my
employees, for my wife, for my kids,
and they can access all the new stuff,
but the best part is that I have control,
for example, my kids.
I don't want them accessing every
AI model so I can restrict that.
I can also restrict what they can ask,
what they get help with so they're not
cheating on their homework and letting
out some network check secrets
and I can see all their checks,
which really you should be looking at
your kids' AI chats if you're letting them
have AI and you should
let them have AI hot take,
but I think kids need to learn how to
use it because that's kind of the future.
Right now it's not going anywhere,
but seriously, I love the solution.
It's better security. My data's
a bit more safe and oh my gosh,
the amount of features
it has, I'm addicted.
This might be the better way
to use ai. This is open Web ui.
Now you've probably heard of that.
In fact, I've talked about it,
but this video is going to come
at it in a bit different way.
I'm going to try something new
and if you've never heard of it,
get your coffee ready. I'm going to
have you set up in about five minutes.
Let's go. Okay. Open Web
ui. It's an open source,
self-hosted web interface for AI and it
allows you to use whatever LLM or large
language model you want to use. And
it's not just cloud stuff like chat,
GPT and Claude, which by the way,
you're probably wondering how
are we going to run those.
You'll see it's really awesome,
but it's not just those.
We can run self-hosted models of the alama
talking like Llama three and Myre and
Deep Seek. You can run 'em all and I
actually, I often run them side by side,
two, three, sometimes four. One of my
favorite features. Speaking of features,
fair warning, there goes your weekend.
There are so many to play with,
it's addicting,
but it's also simple enough for
anyone to start using immediately,
so don't worry, but I will say this
asterisk, this isn't for everyone.
There's one asterisk, one thing that
might scare you away, you'll see,
but I'm still here. I'm still going
to use it. We'll cover that later.
Now what do we need to get this set up?
As I mentioned, this is self-hosted,
which means you yourself are
going to host this somewhere.
You're going to set it up,
you're going to install it,
and for that you really have
two options. Either the cloud,
this is the easiest and fastest
method or you can go on prem,
host it in your house. This could
be on your laptop, on a nas,
on a raspberry pie. I'll
show you both options.
Whichever you choose is going to be quick
and easy and you're going to be like,
how was that so fast? And how is
this so amazing? Trust me, you will.
We'll start with the cloud. Don't
blink. It's going to be fast.
And for this option,
we'll be setting up what's called a
VPS or a virtual private server and the
cloud and we'll be setting it up on
hosting the sponsor of this video.
So real quick in the description, I
have a link hosting your.com/network.
Chuck VPs, go ahead and go
there. Click on choose your plan,
and KVM two is my favorite option because
you're essentially getting yourself a
very healthy home. Laugh. Hey
network, Chuck from the future here.
I know what you're probably
thinking six bucks a month. Why Don?
I just pay for Chacha. Bt, hey, I get
it, but here's why I still love this.
First it's cheaper than chat GBT. Second,
you're getting your own server
that can host your own chat, GBT,
which is just cool. And third, you
can host more than just open web ui.
The server's beefy enough to do a
lot more things. It's a home lab.
I'm telling you a Healthy Home lab. Just
wanted to add that context. Anyways,
back to me. Look at
this thing. A MD, epic,
CPU eight gigs of RAM and VME storage.
Plenty of bandwidth and my favorite
feature for all you home laborers,
backups and snapshots because we break
stuff and you're going to need this.
So just know not only will this
puppy run open web UI just fine,
you'll be able to add more stuff to it.
More projects, resume building moments.
I just started watching
Home improvement again,
so I feel like I need to do
this. Sorry, I couldn't do it.
That's embarrassing. Don't do some
coffee while I deal with that real quick.
You do it at home. See if you can
do it. Tim, the tool man, Taylor,
love that show. That show still hits.
Anyways, let's keep building this.
So I'll choose the KVM two. If
you don't already have an account,
it'll ask you to make an
account. Choose your term.
I'm going to do not 24 months, 12
months sounds pretty good to me.
Check this out. Coupon code, right over
here on the right, type in network.
Chuck 10, apply that sucker. It's now
cheaper. Now pick where you want it to be.
Somewhere close to you
actually, yeah, Phoenix is good.
I think it'll automatically tell
you based on the latency to you.
And then we'll choose our OS. Now for
us, because we want to do open Web ui,
we're in luck.
We'll actually click on application
right here and we'll click on show more.
I don't see it right now where yet Buddy.
We're going to be looking for Llama.
Ah, there it is. He's hiding from me.
They probably have one of the
cutest logos in the industry.
We're going to go ahead and select this
because not only will it install Llama,
which is what people can
use to install local LLMs,
it will also install Open Web UI like
that and it's going to be on Ubuntu
24 0 4. So you can add plenty of
stuff on top of that. Alright,
let's go ahead and click on
confirm. Continue. Actually, I lied.
You're going to get logged in right here.
Enter all your info free malware
scanner. Sure. Okay, click continue.
Enter a root password.
This will be the password that
you'll use to log into your VPS.
Click continue and I think we're almost
done. Yeah, finish that up right here.
Go and it's setting it up right now you
have a virtual private server being spun
up in the cloud and they're installing
OpenWeb UI along with Llama and all you
have to do is sip some coffee.
It's pretty cool. For on-prem,
go watch this video right here. I'll
walk you through it. Just pause me.
I'll still be here. Come back
and see me. Alright, it is done.
We'll click on VPS management page,
go look at it and here is mine.
Go ahead and click on Manage over here
on the right and right now open Web UI is
just waiting for us. Click on the
manage app button right there.
What that will do is launch
another tab. Go and click on that,
essentially your public IP address
on port 80 80. And here we are.
This is your open web ui, unlock
mysteries wherever you are.
Sounds like AI made that. Alright,
go and click on Get started at the bottom
there and here we'll create our first
account for OpenWeb ui. This first
account will be your admin account.
So you have Godlike
Powers over everything.
Click on create admin account
and celebration. We're
here. Okay, let's go. Now,
if you followed along with
the hosting your setup,
you'll see by default we've got a
nice little AI model to play with.
LAMA 3.21 B as opposed to
an open AI model like Chad,
GBT Lama 3.2 is a local model.
It'll use your servers resources instead
of open ais. Let's talk to it. Hey,
how are you? And it feels
like chat GBT, right?
Same kind of familiar interface
except as you might see it's slower.
Actually that wasn't too bad and it
wasn't too bad because this is not a very
smart model. It's very small,
which means it's going to be a bit
dumber than the other ones and you won't
really be able to run bigger smarter
models unless you have some killer
hardware. I'm stocking GPUs Terry,
but we don't really care about that
right now because we're not done yet.
We're about to add some big
boy models from the cloud.
Now when you want to access AI
models like chat, GBT or Claude,
you usually have two options. Option
one, normie mode, you go out to chat GBT,
you pay a monthly plan, pay a lot.
If you want to access to all the new
stuff and that's it, you're done.
It's easy. No shame I do it.
But then option two is where
things get interesting.
APIs application programming interfaces
are what developers use to integrate AI
like Chad CT into their apps and programs.
So what, we're not writing an app,
why do we care? Well, it comes down
to how they pay for that access.
Normies pay a set price per month.
APIs you pay as you go or
you pay for what you use.
Two reasons why. That's amazing. First,
providers normally give API
access to all of their models,
especially the ones they
just released. So think chat,
GT 4.5 that just came out and the people
who have access to that on the normie
plans are only the $200 a month
people, the $20 Pro users,
sorry you're out of luck,
but if you're using an API,
you get access to that right Now.
The second cool thing is that
you may end up saving money,
not guaranteed massive asterisk,
but if people on your team or in your
house aren't really heavy users of ai,
paying for a full plan for them doesn't
make any sense if they're only going to
be using 50 cents a month. Okay,
so what does that look like? Well,
let's get it signed up for it right now.
Let's go out to open AI and
instead of going to chat gbt,
we'll go to open ai.com/api and we'll
get signed in or create an account.
Whatever you got to do, once you're in,
you'll go to the top right and click
on start building. And here, yeah,
it's going to ask you for a credit card,
but you're not going to
be charged per month.
You're only going to be charged
for what you use initially.
You can add just five bucks.
That's five bucks that will
sit there until you use it.
So I'll go and add a
credit card right now.
I'll top it off with five bucks and then
I'll go and create what's called an API
key.
This will actually unlock all these chat
GBT models for us on the open web UI
interface to get that API key. We'll
go to settings at the top, right,
just click that little gear there.
Once there we'll go to the left and
click on API keys and we'll create a new
secret key. Name it, put
it in the default project,
leave everything else as is and click
on create secret key. There's your key.
Copy it.
Let's go put it inside OpenWeb UI right
now here in OpenWeb ui we're going to go
to the top right and click on our
profile icon and click on admin panel.
From here, we'll click on
settings and then connections.
Connections are what give
us additional functionality,
additional LLMs for open web ui and
right there there's a blank space baby
sitting right there for
us. Sorry Taylor Swift.
Well we're going to paste our API key
right there and click on save Now,
no fireworks, nothing
crazy. What happened?
Let's click on the little menu
thing on the left to open that up,
expand it and then click on
the pencil to start a new chat.
And at the top there will change our
model from llama to whatever we stink in
what? Look at all these GPT models.
We have access to everything
including that new 4.5 model.
Let's search for it real quick.
Where's it at? There it is.
Let's start chatting with it.
Let's just have fun and right
now if you follow it along,
you're using a $200 a month
model for nothing. Well,
not for nothing we're about to see. Don't
get crazy yet. Lemme cover this part.
We got to talk about how we pay for
these AI interactions and this is the
asterisk, the little, the gotcha
you got to be careful about.
So when you're talking to an
AI model specifically an LLM,
a large language model that's
going to be text-based.
The way they charge us is by tokens.
It's like Chuck E Cheese just without
crappy pizza and a scary mouse.
Now what's a token? A token is a
word in some cases. So for example,
a small word like you,
that's probably going to be one token or
how more complex words might be broken
up. Actually let me ask it.
How many tokens was your last response?
It's 15 tokens.
Break that up so I can
see which words were
tokens and which were broken up.
That's so sick. Okay, let's doing
my job for me. Do punctuation.
It's its own token. What a ripoff.
If you want to save money with ai,
don't use punctuation.
Okay, words equal tokens.
I still don't understand how
much money we're being charged.
Let's go to the chart.
How much you're charged will
depend on which model you're using.
Certain models are smarter and they
require more resources to answer your
questions. And that's on display right
here. For the oh three mini model,
which is a solid model,
it's going to cost you a dollar
and 10 cents per 1 million tokens.
So that's a healthy amount of
interaction, right? On the other hand,
the oh one reasoning model will
cost you $15 per million tokens.
That's not scary. You want to see what's
scary. The model we were just using,
the 4.5 is their most expensive model,
$75 per 1 million tokens,
and that's just input Notice they
do have an output section too.
I wish I could do that
for people, for my kids,
charge them for talking to me and
then when I give out my wisdom,
make it more expensive. It's genius.
Now I know it's kind of hard to break
down what does a million tokens mean?
How much money am I going to be spending
and am I going to be saving money?
Here's your warning,
right? So a casual user,
let's say they have 50 conversations
a month, about a thousand tokens each.
It could be as low as 50 cents assuming
they're using a model like the 4.0.
Now if you use AI like me, that's very
low usage, but some people are like that.
A moderate user might have
200 conversations a month
and this could be anywhere
from five to 10 bucks a month.
Power users, and keep in mind,
these are all very rough estimates.
This can be sky's a limit, right?
20 bucks to infinity. So hey, draw the
infinity. Simple think I'm nailing it.
Yep, got it. I can tell you
right now, me as a power user,
it would not be 20 bucks a month. It'd
be a lot more. What impacts that? Well,
what models you choose? I talk to
the best models a lot. 4.50 yeah,
oh 1 0 3 talking all day and
my conversations are long
and that does impact how
much it's going to cost context
when you're using OpenWeb ui,
the context of our messages
are being sent each time I
say something to the API so that
it knows what I'm talking about.
So the number of tokens I'm using
exponentially grows with the length of my
conversation and sometimes I sit there
and talk for a while with an AI to figure
stuff out. Now notice as
part of open AI's pricing,
this is very specific to OpenAI.
They do have cashed input which will
help offset a lot of those costs.
They will cash your responses, kind
of keep them in memory over time.
I think it's like 24 hours by
default, they may change that.
Don't quote me on that. So I'll say
all that as a warning. Be careful.
Can this save you money? Maybe,
but I wouldn't do this as the
primary goal to save money For me,
it's more about I want to give my
family myself and my employees access to
all the ai and I don't want to pay for
15 million plans and have to manage all
these different things.
I want one interface,
one place to go and I want control.
Now if you're worried about this,
I will show you ways we can put in
budgets with a tool I'm about to show you.
It's so cool.
You can put a budget in per person so
they don't go over like you're stuck at 20
bucks a month. If you use that
4.5 all day, you're done, buddy.
You're talking to Alama
for the rest of the day.
Why is Alex's work so crappy after three?
I don't know. Let's break this down.
What was this scribbly writing?
Beautiful dude, I'm on a roll today.
Let's keep going.
Now we're jumping into a very fun part
of this tutorial and it's to solve kind
of a big problem with open
web ui. Check this out.
If I go back to my settings where
I added the open AI API key and my
connections, I really only have options
for two types of connections. Open ai,
API and oh llama. API. Oh, llama being
the local option. What about clo?
What about Gemini? What about all
these fun ones? I want to try,
the whole point of this was
to try everything. Yeah,
that's kind of a problem because you
can't just plug in Claude right here or
Anthropic. It won't happen. This is where
a tool I fell in love with comes in.
It's called light. LLM Light.
LM is a proxy for AI or a gateway.
If we go to the webpage real quick,
they connect to so many ais. I think
they say a hundred plus, right?
And that's exactly what we're going
to do. So check this out. Open web ui.
All it's going to have to connect to
is light lm and it does that just fine
because it has an open AI
compatible API. It does great.
And then with light LLM, we connect
everything else. Open AI andro,
which is Claude Gemini, grok Deep seek
and no, not the one hosted in China.
You can actually access an American
hosted Deepsea on another service called
Grok with a Q. Very
confusing but very cool.
Now Light LM will be a proxy server
that will install alongside open web ui.
It's not scary. Trust me. It'll
take like three seconds. You ready?
Get your coffee. Let's install
light lm. So real quick,
we're going to access the same server
we installed Open Web UI on if you
followed along with me
on the hosting your side,
setting up A VPS right here in our
portal where we're managing our VPS,
we're going to access the terminal,
which is super easy for us.
There's a button right here, B browser
terminal. Go ahead and click on that.
For everyone else,
just access the terminal of whatever
server you want to deploy this on.
We'll deploy it via Docker,
very similar to how we up open web UI on
the other tutorial you watched earlier.
I said earlier too much. Alright,
we're inside the terminal.
I will have the commands below,
but the first thing we'll do is use
GI to clone the LM proxy server GI
clone. So lemme give me some
room up here. There we go.
Get clone and then the address
light LLM. Ready, set, clone.
This will clone that repo from GitHub and
create a folder for us that we'll jump
into here in a moment. Little coffee break
and it is done. Type in LS to
see our new folder. There it is.
Type in CD and light LLM to
jump into that folder we're in.
Now we're only two commands away.
First thing we'll do is use nano type
in Nano the best text editor ever and
we'll edit the file, the hidden
file env, just like that.
And we're going to add two lines
of config. First we'll type in LLM,
all caps master key that'll have that.
Equal quotes, double
quotes, SK dash something.
Ideally you want it to be
a randomly generated key.
Actually I'll just use Dashlane
to do that for me right now.
I'll just do digits and
letters. We'll do 10 of them
and you'll want to copy
this down somewhere.
This will be your password to log
into the server Once we build it,
I just clicked out of my browser
terminal. Good thing I copy my password.
Alright, we'll close
out with double quotes.
Hit enter and we'll add
one more line of config.
We'll add the lights LM salt key just like
this and have that equal the
same kind of starting point,
SK dash and then a randomly
generated string of characters.
This will be used to encrypt and
decrypt your L-M-A-P-I key credentials.
So I'll randomly generate
some stuff real quick,
shall copy all of this real quick. Put
that somewhere safe, then hit control X,
Y enter to save.
And for most scenarios all we have to
do is type in docker dash compose up
dash D, ready, set, go. And this
is literally building our server.
We don't have to worry about anything
else except making sure we sip some coffee
while it's happening. Now,
while that's installing,
let's get our API keys. Ready?
First we need our open AI API Key.
Easy for me to say. I normally like
to create a new key for every service.
So I'll create a new one called this
light LM default project. Create it,
copy it, get it ready. And the same
process you can repeat for anthropic,
for the Claude models Gemini,
for the Google based models.
I'm just going to do anthropic
for now and I'll grab Grok too.
Grok being X ai, Elon Musk's ai,
which is actually pretty
amazing unfortunately I
don't think the grok three is
available on API just yet. But I'll
go and create a key and it's done.
If you see something like this,
you're solid. If we type in docker PS,
because everything is
running through Docker,
we'll see all of our
healthy containers running.
Now what we'll do is open up a new tab.
Actually I need to grab the IP address
of my server here, where to go?
There it is. Grab that IP
address and in your address bar,
go out to that IP address port. I
think it's 8,000, what was it? Oh,
it's 4,000 Port 4,000. There we
go. And then we'll click on lights,
LM admin panel on ui, click on that.
The username will be admin and then it'll
be that master key you set up and the
environment variable, the SK one N or N.
Now lots of bells and whistles.
All we care about right now
is doing a few things. First,
let's go to models on the left here
and then right here and the top menu,
you'll see the option to click on add
model, and then we'll add our first model.
Let's start with Claude.
So I want to click on Anthropic and
we could either choose all models,
like just go crazy, select
them all or be very specific.
So maybe I only want the three seven
latest and 2.1 to compare how dumb
it's.
Then I'll add my API key here and add
the model just like this at the bottom
right, clicking on all models, you can
see it sitting right there. 2.1, 3.7.
And then here's the cool part. This
is where the proxying comes in.
We'll go to the top left
and click on virtual keys.
We're going to create our own virtual
API keys that can control so many things.
Check this out. We'll create a new key
for now. We'll say it's owned by us,
we don't need a team or anything.
We'll name the key. I don't know kids.
So let's say we're making it for my
kids and we'll say the models they can
access are three, seven and two.
One checking out optional settings.
You can add a budget,
20 bucks and this will be a monthly
budget and you can do a lot of,
you can expire the key. They
have thing called guardrails,
which we're not going to cover right now,
but we'll go ahead and create
the key and there's our key.
We'll copy it and now we'll add it to
open web ui. So here we're in open web ui,
we're at our admin panel
on the connections and say,
I want to delete my open AI API
key. Delete. I'm going to add.
You don't have to do that by the
way. Now I'm going to add my light.
L-L-M-A-P-I Key under the open AI API
key. I feel like I've been saying open ai,
API so much the base URL will
be htt, P colon, wack, wack,
local host port 4,000. So colon 4,000.
And then we'll put our API key right
in here, just like that and click on,
actually no, we'll test it real
quick. Verify, connection, verify.
And that's because they're on the same
server. Local host is right there.
Click on save. And now if we go
back and try to create a new, oh,
there it's new chat. Claude sitting
right there. Oh, that's so cool.
How you doing Claude? Ah,
love it. Check this out.
I'm just going to show it to you
right now. I was going to wait,
but click on add model. We can
put Claude 2.1 there as well.
Let's do a new chat. Actually,
let's add them side by side and say
tell me a riddle and they'll answer it
side by side. How cool is that? Now real
quick, I'm not going to make you wait.
I'm going to add open AI and grok.
Now I added these models and now I have
Aroc and four oh and oh three many.
But no one inside of Open Web UI will
have access unless I give it access to
those virtual keys. So I can edit my
key, go to settings, edit settings,
and add additional models. So I
oh three many grok four oh save,
and then now back at open web UI land.
I'm going to refresh and see if they
show up. I want to do a new chat.
There it is. I rocked the
party here. 4 0 0 3 mini.
So now I've got four different ais
and we'll add a llama in for fun too.
How many Rs are in the word
strawberry? And now they're all
answering except for O three.
Many doesn't like it. Claw got it, right,
GR got it right. Four o got it right.
And llama's dumb. How cool is this?
And over here on the light LLM side,
you can add as many virtual API, keys as
you want. Add those in the open web ui.
Actually check this out on the
light LLM side, if I go to usage,
it'll show me how much is being spent.
I probably need some time to
catch up with the other ones,
but this is now my AI hub and this
is where I'll control the budget.
And then back in open web, UI, land.
Just a few things I want to
cover real quick. First, my kids,
let me add my kids to my team here.
I go to settings, admin settings,
and then users. I can create groups.
Let's create a group, call it kids.
I'll go back to overview and create
some users here, kid one and kid two.
I can go to my groups,
add them to the kids group and here I
can say what permissions they have access
to. Can they access models? Can they
access knowledge and prompts and tools,
which is a whole world of things
I can't talk about right now.
This video would be way too long.
I'll click on save and we can also
control who has access to what models.
Let's say I only want them to have
access to Claude three, seven.
It's the smartest. I can
go in here to the model,
click on groups and say the kids
have it. Everyone else, sorry. No,
I can also do this. Give it a system
prompt. You are a school helper.
Your job is to help my kids,
help kids with their school,
but you cannot do their work for them.
Never let them cheat.
Never write an essay or solve a problem.
You must guide them.
And you can only talk about school related
subjects, guardrails in place.
Click on save and I'll just
grab this URL real quick,
open it up in a incognito
window and log in as my kids
kid1@hotmail.com. Alright,
I've only got access to one model.
Write a paper for me
about George Washington
and there we go. It won't write
it for me. What is two plus two?
Oh, it gave me the answer. What is
nine times seven divided by four?
Okay, it'll help with math. Let's
ask you something non-school related.
What is the plot of the movie?
The Matrix. Oh, that's answering.
Oh, film studies class. Okay,
got it. This is something
my daughter would ask,
so it just relates it back to school.
That's very cool. I like that.
Now the best part is getting back to
the users on kid one here who was just
having the conversation. I can
click on chats and there it is,
and I can jump right in there
and see everything that was said,
which for my employees,
I'm not going to monitor that and I
can turn that off for my kids. 100%.
AI is nuts. And you got to keep
an eye on that kind of stuff.
Now this video is way too long. I'm
sitting here staring at the screen.
Can I talk about that? No. Can I talk
about that? No, it'd be too long.
Let me know if you want me to make
another video covering the ins and outs of
Open Web UI because it has tools,
prompts, functions, pipelines,
image generation. Oh,
it's so addicting and I would love to
hear if you've done anything cool with
this as well. Now, there's one last
piece of this I haven't shown you yet,
and that's this A here right
now it's just an IP address.
You don't want to give your family
and friends an IP address. Like, hey,
go alto. 1 8 5 2, 8, 2, 2 4.
That's the new AI server. No,
that's terrible. I'm going to walk
you through how to set up a DNS name.
We'll purchase it on hosting here.
I'm going to walk you through how to
set up a friendly domain name for this.
But we'll do that in a separate
video right here. That's all I got.
Thanks again to hosting here
for sponsoring this video
and I'll catch you guys
next time. Get control of yourself.