I Reduced My AI Coding Costs by 50%, and Here’s How You Can Too!

Video ID: jPc2lLU7O44

YouTube URL: https://www.youtube.com/watch?v=jPc2lLU7O44

Added At: 13-06-25 21:16:40

Processed: No

Sentiment: Neutral

Categories: Tech, Education

Tags: AI costs, Claud API, Deep Seek, Gemini FL 2.0, Cursor, prompt override, programming tutorial

Summary

The creator shares their experience with reducing AI costs, specifically with the Claud API. They discuss their use of Deep Seek, Gemini FL 2.0, and Cursor to lower costs. The video also touches on overriding prompts for better results.

Transcript

if you were like me you spend way too
much on AI maybe not this much maybe on
an extreme case but when you're coding
like the the fees rack up really quickly
especially when you're using like the
Claud API so in my month of December
2024 I spent
$375 when I saw that I was like I need
to get my cost down so you can't just do
that right away because I need to kind
of like figure out a plan to
strategically lower my cost and you may
have followed some of my videos where
I've talked about Google Gemini 2.0
flash being like the value King that has
helped me substantially you may saw my
video that I did where I made the prompt
like I overrode The Prompt for rot code
to save money substantially that has
helped me a ton so ending February 2025
this is what I spent about $75 in API
cost the big helpers here was using deep
seek more and Gemini flash 2.o more
because those were my big cost savings
this month and then I added cursor in
which has also offloaded a lot of my API
cost because now I'm moving a lot of my
demand into cursor cursor is so good
there's no way that price is going to
stay $20 because of the value they're
providing I have a feeling in the future
that's going to be $100 a month I I
really do so take advantage of cursor at
$20 a month while you can I have no
investment no sponsorship by them but
the value they provide right now is
incredible ible and if they keep it at
$20 I'll be a fan I'll I'll use it
forever but I don't see how they do that
after using the API as much as I have
now gr is my other big cost here 40
bucks a month I did pay for an entire
year so I think I actually got it like
$38 a month I'm stuck with that for a
year but honestly I really like it um I
use it for like image generation just
ideation getting ideas for like YouTube
titles because I suck at that stuff I
really enjoy that what they've built
there and then I have my regular
subscriptions with anthropic Google open
Ai and to GitHub co-pilot GitHub
co-pilot is also another one that's a
really good value so I need to get this
lower my goal being $100 a month if I
can and that's where this comes in so
recently I posted that video on the
shortened prompt which I'll I'll talk
about a little bit more but that has
substantially saved me API cost I've
been monitoring that and I want to say
it's about 50% I believe I can get my
API cost down to less than $50 for the
month of April with the help of using
deep seek with using Gemini FL 2.0 and
by using my smaller prompt all while
still do being as effective as I've
always been I still have to pay for Gro
I've already paid for Gro technically
but I'm still extrapolating that out
month by month csor I'm going to keep
that as long as that price doesn't go up
substantially I will be paying for curs
it's just too good and the GitHub
co-pilot is also a substantial value and
you should use that while you can't do
especially the agent version of that
because you do get access to Claud there
too I'm going to actually be able to
save a substantial amount of money not
quite the 100 but it would be if I
wasn't paying the $40 a month for the
Twitter premium or X premium account and
why did I choose requesti well actually
I had never heard a
requeste uh I had mainly used open
router before but anyway I saw this and
this kind of struck me because I love
companies that listen to their users and
and I thought it was kind of cool that
they made a change so quickly like
within a day of me releasing that prompt
they added it so that their users can
use it to save them money that doesn't
benefit this company you know they want
more tokens to go through but what
they've done is they've added a built-in
way to use my small coder my small code
prompt to save their customers money I
love it so I actually started digging
into it a bit and then I started talking
to him and I actually signed up so I
actually went and signed up myself they
gave me a $1 welcome credit which you
can kind of barely see here and then a
$5 bonus on my first $5 I think I put
$10 in there maybe I think in general I
got the six free dollars and then I put
some money in to get some free credit so
I mean at a minimum you could try it out
to get the free credits there now I'm
not sponsored by them at all but when I
was talking to to and told him I was
using it they did actually give me a few
credits so I think uh they popped like
$40 onto my credit balance which I am
very appreciative of by the way because
any free credits is amazing but they
wanted me just to try it out and give
them feedback on it now the second
reason I'm going to go with requeste the
first being I love the community
engagement there the customer engagement
there second reason is I'm a data nerd I
love being able to come in and see how
I'm using these models and you can see
here some of this is from tests I just
ran that are actually more expensive
than I should for this video which I'm
going to show you the results of but
anyway this is a amazing breakdown to
show you where you're actually spending
your money so for example I did a lot
with Quinn qwq 32b you know a little
over a million tokens like sent out on
that and it was
45 so highly recommend you checking it
out I won't talk too much more about it
but worth getting those free credits at
a minimum now and then some of you might
ask why don't I just use cursor well
check this out one day in I don't know
if this was a 30-day month or whatever
or 31 day month but I'm already 96 out
of 500 of my fast credits I use way too
much to actually use cursor full-time I
would if I could but I can't so I have
to like balance my workload into get
help co-pilot CER and the API usage I
just have to with the amount of code
that I'm doing now I want to go through
how I actually code on a dat today basis
the thing I use a lot is the architect
or ask mode in rot code if you're not
familiar with rot code it is a vs
extension that you can install within
the extension Marketplace there's
apparently an update I need to actually
do and it's really easy to get going so
and I'll show you kind of how to set it
up a little bit but I'm not that's not
the purpose of this video I have other
videos that kind of go through that so
this is one of my key things I use
whenever I'm doing something big in this
case oh don't shoot me here I'm I'm
actually kind of embarrassed to this
file uh but this file has gotten too big
it's nobody should have a 2,000 line
file but it's just been kind of built up
and crud put into it over time I I need
to break this file up it it's just
gotten unruly so here I have deep seek
R1 go through and build a plan of the
things that need to change and then at
this point I can choose do I want to
switch to claw to do the implementation
and pay more or do I want to go to deep
seek V3 which is what I would normally
do to do a first pass on it because deep
seek V3 very cheap and very good so I
can do this for pennies you know
probably 30 cents I could have all of
this implemented with the Deep seek
model whereas if I were to do this with
Claude it's probably going to be $2 and
you have to make that choice for
yourself do you want to spend that money
on it or like is it so complicated that
you need the best model on it or can you
do it with deep seek V3 in this case I
think the plan is so good that I could
actually send this through deep seek V3
and it'd be totally fine and then if it
fails I've maybe lost 50 cents maybe 30
cents and then I could always come back
and do CLA if I need to but if Claud
fails you're out dollars and I know like
it doesn't sound like a lot but it adds
up on a monthly basis when you're doing
dollars an hour of time
so let me give you a couple other
examples here so side by side this left
side I use Claude 3.7 to actually plan
and Implement a fully functional 3D
flight
simulator the output of it was broken
$334 cents to actually do that
broken on the right side I used deep
seek R1 to
plan and then claw 3.7 to
implement and it was also
broken but the code output for both are
very
close they're slightly different and I
feel like with a few iterations I could
get both of them working so the the
question is is it worth paying
$334 or
a112 to get a basically equally Broken
app and this is where things get a
little funny because the next thing I'm
going to show
you is this
one this one use deep seek R1 to plan
that same
prompt and then I use claw 3.7 to
implement but I use my overridden prompt
that I call coder short rules remember
the price of the other two $112 broken
334 broken now I'm not saying my prop
makes things better but in this
particular case 49 cents we got pretty
much all the same files we'd expect for
49 and it is the most functional version
of the game I actually have a cool
looking aircraft it's got the heading
and everything working I don't think the
speeds working I was trying to get that
working but I could iterate on that now
the other two didn't even load this is
incredible I'm I cannot tell you enough
how important it is to really like get
control of your AI agents to save you
money now these agents are built for
like money is no limit but the fact of
the matter is when you're using it every
day I can't spend dollars an hour using
it so having these overridden prompts
and what I would love to see happen is
people actually coming through and
having like a library of these maybe we
should start like a GitHub repository of
these prompts that we're making that
work with these particular tools because
I think we could save each other a lot
of headaches and a lot of money to show
you how to actually override your
prompts if you go into your mode so over
here on the left is the modes code Arch
Tech Ask and debug come with r code when
you install it I'm pretty sure debug
does I don't think I added that one I've
added the coder short rules one and I've
added the LM ask and the LM Studio code
one CU these are like ones that I've
made specifically for local models that
I'm still tuning I haven't released
those yet because I don't feel like
they're great yet and I have another one
I'm working on for Gemma 3 so anyway if
let's if we go into coder short rules
I'm going hit edit here select the coder
short rules and if you scroll down you
can expand this Advanced override system
prompt and you can click on this little
link right here and this is how you
override it and just to verify it's
working you can hit the preview and to
see the how this actually looks you know
this is still a large prompt but it is
significantly less than the the full
prompt I'm talking like 115th the size
it is crazy difference in size so and
you can tell that by the quality of the
code we got in the cost of 49 now this
used deep seek R1 to plan and Claud
3.7 who actually Implement so then CLA
3.7 didn't actually have to do the plan
so I save tokens that way because I do
not have a good like shortened ask mode
now the other thing I want to talk about
is the different
configurations and this is what I think
you should pay attention to as well you
should have many configurations and you
should be switching between them based
on the work you're doing now I could
have very much had deep seek V3
implement this and maybe got something
80% is good 90% is good and I would have
paid even less than the 49 CS but in
this case I use claw just to show like
the apples to Apple comparison there but
your configuration should be the models
you switch between the most so I have
claw 3.7 one of my main models I use
this for anything complex but it is
expensive I have that routed through
reques rather than open rout that helps
me get past their limits that they have
but it also helps me unify my spin that
I have every month with one provider
then I have my local models so I only
use these models when it's something
very simple you know maybe I actually
want to write like a particular unit
test and I don't want to go pay any
money for it these don't cost me
anything I wish these would were better
because I would love to move more of my
workload to local but I can do some of
my stuff and typically what this entails
is me cop and pasting code from RW code
into the editor because I just turn off
the tool editing to save tokens and then
I use this model a lot this Quinn 2.5 VL
because it has Vision capabilities so a
lot of times what I'll do is I'll
actually use this to pull information
from like a mockup and then I'll feed
that into something to implement it like
deep seek V3 doesn't have Vision
capabilities but I could use Quinn 2.5
VL to take an image describe the parts
of it that need to be built and then
put that into deep seek V3 to implement
and is it perfect no it's not going to
be like CLA 3.5 but it does a pretty
good job when you combine those two
things together and one of my favorite
models for things is this Gemini flash
2.0 this is the value King I'm just
going to click on this to show you my
configuration 10 cents per million input
tokens 40 cents per million output
tokens and again I'm setting all this
through requesting because my goal is by
the end of this month unify it all
through there and unless I have some
major problems with them I don't see me
switching from that and a couple other
ones to maybe touch on I do keep these
miscellaneous ones so for example I
wanted to test this Gemma 327 B1 they
had a free version on open router and I
just switched the model on this
constantly so it's not like a defined
model configuration it's a it's
basically my free floater and I do the
same thing with requesti I was using the
qwq 32 billion perimeter model here and
I keep misspelling requeste as requestly
so if you ever see me have the TL it is
not TL it is requeste just to be very
clear there I think I've got all my
spelling corrected at this point and
then of course I have my deeps R1 which
I absolutely love R1 it is a killer
model especially for planning
architecting that type stuff and deep SE
V3 honestly gives claw 3.5 a run for its
money it's issues really are that it
doesn't support images so I don't like
to do it for mockups unless I combine it
with that Quinn 2.5 VL model the final
model I'll touch on that I use is this
Google pro
2.0 because sometimes there are free
models you could use so if you really
want to be smart with money use some of
these free ones the ones that are like
the experimental ones that they want
people to use and they're not charging
now they will be highly rate limited and
you can configure that down below in R
code you can set the rate limit like the
minimum time between API requests so I
can say I want it to be like 10 seconds
you can use use these free models and
not pay anything and get some pretty
good results so the one that I've been
playing around with most lately is the
Google 2. pro experimental pretty good
model again it's so rate limited that
I've had a hard time like staying with
it but you can do that with some of
these models you can use these free ones
to save you cost so I've gone on a lot
now hopefully what I've what I've gone
over has been helpful for you basically
my tips for you are switch models while
coding do not just stay on a single
model
even if you're using cursor if you're
doing something simple switch to a non
premium model to save you those credits
the second thing is override your system
prompts in rode use those smaller
prompts especially if you don't need all
the other functionality just note if you
were to take my prompt it removes all of
the mCP stuff out of there there's a lot
of stuff that's put into that prompt
that's just not needed if you just want
to do code if you're just looking to get
something written and you saw that how
much it saved we're talking
$334 down to 40 49 and we got a more
functional version of the game the next
thing is cancel subscriptions and start
consolidating into really one llm router
service requestyoutube
on this video which is open web UI so
you can visit GitHub and grab open web
UI this is probably the best open-source
way to consolidate your models it's
incredibly easy to run either with
python or Docker and ultimately I have
it set up here and running the way you
can configure this to work with requeste
is you go to settings you go to admin
settings you go to connections and
basically put in the requeste URL here
and your API key and you're done I've
been testing it out to use this in place
of like claw. a or chat. gp.com and you
get access to all the models they're all
available I even have my olama ones
hooked in here too this is chat gbt 4.5
just to show you how this works I've
selected the open AI one here all
running through the
requestyoutube that's it I'm going to be
using open web UI to as part of that
consolidation I'm actually going to try
to deploy it I've got a computer behind
me over here needs to be upgraded but
I'm going to try to deploy it there and
open it up to my entire house so my kids
can use it my wife can use it I want to
get my wife using AI if I can do that we
know we're winning anyway I hopefully
it's been helpful and if it has let me
know in the comments below if there's
something that you do different to save
money while coding I would love to know
that as well also check out reques you
can get those six free credits again I'm
not sponsored by them although full trpy
they did give me some credits just to
try out the service but I've really come
to like it help me out with the
algorithm give me a like And subscribe
otherwise guys I appreciate you all
we're been having so much fun making uh
videos for this and the Discord channel
is just amazing I love conversing with
all of you sharing ideas sharing
projects it's just amazing I've learned
a lot from you guys too so jump in if
you want to join that Community
otherwise I'll see you in the next one
peace out