Video ID: jPc2lLU7O44
YouTube URL: https://www.youtube.com/watch?v=jPc2lLU7O44
Added At: 13-06-25 21:16:40
Processed: No
Sentiment: Neutral
Categories: Tech, Education
Tags: AI costs, Claud API, Deep Seek, Gemini FL 2.0, Cursor, prompt override, programming tutorial
Summary
The creator shares their experience with reducing AI costs, specifically with the Claud API. They discuss their use of Deep Seek, Gemini FL 2.0, and Cursor to lower costs. The video also touches on overriding prompts for better results.
Transcript
if you were like me you spend way too much on AI maybe not this much maybe on an extreme case but when you're coding like the the fees rack up really quickly especially when you're using like the Claud API so in my month of December 2024 I spent $375 when I saw that I was like I need to get my cost down so you can't just do that right away because I need to kind of like figure out a plan to strategically lower my cost and you may have followed some of my videos where I've talked about Google Gemini 2.0 flash being like the value King that has helped me substantially you may saw my video that I did where I made the prompt like I overrode The Prompt for rot code to save money substantially that has helped me a ton so ending February 2025 this is what I spent about $75 in API cost the big helpers here was using deep seek more and Gemini flash 2.o more because those were my big cost savings this month and then I added cursor in which has also offloaded a lot of my API cost because now I'm moving a lot of my demand into cursor cursor is so good there's no way that price is going to stay $20 because of the value they're providing I have a feeling in the future that's going to be $100 a month I I really do so take advantage of cursor at $20 a month while you can I have no investment no sponsorship by them but the value they provide right now is incredible ible and if they keep it at $20 I'll be a fan I'll I'll use it forever but I don't see how they do that after using the API as much as I have now gr is my other big cost here 40 bucks a month I did pay for an entire year so I think I actually got it like $38 a month I'm stuck with that for a year but honestly I really like it um I use it for like image generation just ideation getting ideas for like YouTube titles because I suck at that stuff I really enjoy that what they've built there and then I have my regular subscriptions with anthropic Google open Ai and to GitHub co-pilot GitHub co-pilot is also another one that's a really good value so I need to get this lower my goal being $100 a month if I can and that's where this comes in so recently I posted that video on the shortened prompt which I'll I'll talk about a little bit more but that has substantially saved me API cost I've been monitoring that and I want to say it's about 50% I believe I can get my API cost down to less than $50 for the month of April with the help of using deep seek with using Gemini FL 2.0 and by using my smaller prompt all while still do being as effective as I've always been I still have to pay for Gro I've already paid for Gro technically but I'm still extrapolating that out month by month csor I'm going to keep that as long as that price doesn't go up substantially I will be paying for curs it's just too good and the GitHub co-pilot is also a substantial value and you should use that while you can't do especially the agent version of that because you do get access to Claud there too I'm going to actually be able to save a substantial amount of money not quite the 100 but it would be if I wasn't paying the $40 a month for the Twitter premium or X premium account and why did I choose requesti well actually I had never heard a requeste uh I had mainly used open router before but anyway I saw this and this kind of struck me because I love companies that listen to their users and and I thought it was kind of cool that they made a change so quickly like within a day of me releasing that prompt they added it so that their users can use it to save them money that doesn't benefit this company you know they want more tokens to go through but what they've done is they've added a built-in way to use my small coder my small code prompt to save their customers money I love it so I actually started digging into it a bit and then I started talking to him and I actually signed up so I actually went and signed up myself they gave me a $1 welcome credit which you can kind of barely see here and then a $5 bonus on my first $5 I think I put $10 in there maybe I think in general I got the six free dollars and then I put some money in to get some free credit so I mean at a minimum you could try it out to get the free credits there now I'm not sponsored by them at all but when I was talking to to and told him I was using it they did actually give me a few credits so I think uh they popped like $40 onto my credit balance which I am very appreciative of by the way because any free credits is amazing but they wanted me just to try it out and give them feedback on it now the second reason I'm going to go with requeste the first being I love the community engagement there the customer engagement there second reason is I'm a data nerd I love being able to come in and see how I'm using these models and you can see here some of this is from tests I just ran that are actually more expensive than I should for this video which I'm going to show you the results of but anyway this is a amazing breakdown to show you where you're actually spending your money so for example I did a lot with Quinn qwq 32b you know a little over a million tokens like sent out on that and it was 45 so highly recommend you checking it out I won't talk too much more about it but worth getting those free credits at a minimum now and then some of you might ask why don't I just use cursor well check this out one day in I don't know if this was a 30-day month or whatever or 31 day month but I'm already 96 out of 500 of my fast credits I use way too much to actually use cursor full-time I would if I could but I can't so I have to like balance my workload into get help co-pilot CER and the API usage I just have to with the amount of code that I'm doing now I want to go through how I actually code on a dat today basis the thing I use a lot is the architect or ask mode in rot code if you're not familiar with rot code it is a vs extension that you can install within the extension Marketplace there's apparently an update I need to actually do and it's really easy to get going so and I'll show you kind of how to set it up a little bit but I'm not that's not the purpose of this video I have other videos that kind of go through that so this is one of my key things I use whenever I'm doing something big in this case oh don't shoot me here I'm I'm actually kind of embarrassed to this file uh but this file has gotten too big it's nobody should have a 2,000 line file but it's just been kind of built up and crud put into it over time I I need to break this file up it it's just gotten unruly so here I have deep seek R1 go through and build a plan of the things that need to change and then at this point I can choose do I want to switch to claw to do the implementation and pay more or do I want to go to deep seek V3 which is what I would normally do to do a first pass on it because deep seek V3 very cheap and very good so I can do this for pennies you know probably 30 cents I could have all of this implemented with the Deep seek model whereas if I were to do this with Claude it's probably going to be $2 and you have to make that choice for yourself do you want to spend that money on it or like is it so complicated that you need the best model on it or can you do it with deep seek V3 in this case I think the plan is so good that I could actually send this through deep seek V3 and it'd be totally fine and then if it fails I've maybe lost 50 cents maybe 30 cents and then I could always come back and do CLA if I need to but if Claud fails you're out dollars and I know like it doesn't sound like a lot but it adds up on a monthly basis when you're doing dollars an hour of time so let me give you a couple other examples here so side by side this left side I use Claude 3.7 to actually plan and Implement a fully functional 3D flight simulator the output of it was broken $334 cents to actually do that broken on the right side I used deep seek R1 to plan and then claw 3.7 to implement and it was also broken but the code output for both are very close they're slightly different and I feel like with a few iterations I could get both of them working so the the question is is it worth paying $334 or a112 to get a basically equally Broken app and this is where things get a little funny because the next thing I'm going to show you is this one this one use deep seek R1 to plan that same prompt and then I use claw 3.7 to implement but I use my overridden prompt that I call coder short rules remember the price of the other two $112 broken 334 broken now I'm not saying my prop makes things better but in this particular case 49 cents we got pretty much all the same files we'd expect for 49 and it is the most functional version of the game I actually have a cool looking aircraft it's got the heading and everything working I don't think the speeds working I was trying to get that working but I could iterate on that now the other two didn't even load this is incredible I'm I cannot tell you enough how important it is to really like get control of your AI agents to save you money now these agents are built for like money is no limit but the fact of the matter is when you're using it every day I can't spend dollars an hour using it so having these overridden prompts and what I would love to see happen is people actually coming through and having like a library of these maybe we should start like a GitHub repository of these prompts that we're making that work with these particular tools because I think we could save each other a lot of headaches and a lot of money to show you how to actually override your prompts if you go into your mode so over here on the left is the modes code Arch Tech Ask and debug come with r code when you install it I'm pretty sure debug does I don't think I added that one I've added the coder short rules one and I've added the LM ask and the LM Studio code one CU these are like ones that I've made specifically for local models that I'm still tuning I haven't released those yet because I don't feel like they're great yet and I have another one I'm working on for Gemma 3 so anyway if let's if we go into coder short rules I'm going hit edit here select the coder short rules and if you scroll down you can expand this Advanced override system prompt and you can click on this little link right here and this is how you override it and just to verify it's working you can hit the preview and to see the how this actually looks you know this is still a large prompt but it is significantly less than the the full prompt I'm talking like 115th the size it is crazy difference in size so and you can tell that by the quality of the code we got in the cost of 49 now this used deep seek R1 to plan and Claud 3.7 who actually Implement so then CLA 3.7 didn't actually have to do the plan so I save tokens that way because I do not have a good like shortened ask mode now the other thing I want to talk about is the different configurations and this is what I think you should pay attention to as well you should have many configurations and you should be switching between them based on the work you're doing now I could have very much had deep seek V3 implement this and maybe got something 80% is good 90% is good and I would have paid even less than the 49 CS but in this case I use claw just to show like the apples to Apple comparison there but your configuration should be the models you switch between the most so I have claw 3.7 one of my main models I use this for anything complex but it is expensive I have that routed through reques rather than open rout that helps me get past their limits that they have but it also helps me unify my spin that I have every month with one provider then I have my local models so I only use these models when it's something very simple you know maybe I actually want to write like a particular unit test and I don't want to go pay any money for it these don't cost me anything I wish these would were better because I would love to move more of my workload to local but I can do some of my stuff and typically what this entails is me cop and pasting code from RW code into the editor because I just turn off the tool editing to save tokens and then I use this model a lot this Quinn 2.5 VL because it has Vision capabilities so a lot of times what I'll do is I'll actually use this to pull information from like a mockup and then I'll feed that into something to implement it like deep seek V3 doesn't have Vision capabilities but I could use Quinn 2.5 VL to take an image describe the parts of it that need to be built and then put that into deep seek V3 to implement and is it perfect no it's not going to be like CLA 3.5 but it does a pretty good job when you combine those two things together and one of my favorite models for things is this Gemini flash 2.0 this is the value King I'm just going to click on this to show you my configuration 10 cents per million input tokens 40 cents per million output tokens and again I'm setting all this through requesting because my goal is by the end of this month unify it all through there and unless I have some major problems with them I don't see me switching from that and a couple other ones to maybe touch on I do keep these miscellaneous ones so for example I wanted to test this Gemma 327 B1 they had a free version on open router and I just switched the model on this constantly so it's not like a defined model configuration it's a it's basically my free floater and I do the same thing with requesti I was using the qwq 32 billion perimeter model here and I keep misspelling requeste as requestly so if you ever see me have the TL it is not TL it is requeste just to be very clear there I think I've got all my spelling corrected at this point and then of course I have my deeps R1 which I absolutely love R1 it is a killer model especially for planning architecting that type stuff and deep SE V3 honestly gives claw 3.5 a run for its money it's issues really are that it doesn't support images so I don't like to do it for mockups unless I combine it with that Quinn 2.5 VL model the final model I'll touch on that I use is this Google pro 2.0 because sometimes there are free models you could use so if you really want to be smart with money use some of these free ones the ones that are like the experimental ones that they want people to use and they're not charging now they will be highly rate limited and you can configure that down below in R code you can set the rate limit like the minimum time between API requests so I can say I want it to be like 10 seconds you can use use these free models and not pay anything and get some pretty good results so the one that I've been playing around with most lately is the Google 2. pro experimental pretty good model again it's so rate limited that I've had a hard time like staying with it but you can do that with some of these models you can use these free ones to save you cost so I've gone on a lot now hopefully what I've what I've gone over has been helpful for you basically my tips for you are switch models while coding do not just stay on a single model even if you're using cursor if you're doing something simple switch to a non premium model to save you those credits the second thing is override your system prompts in rode use those smaller prompts especially if you don't need all the other functionality just note if you were to take my prompt it removes all of the mCP stuff out of there there's a lot of stuff that's put into that prompt that's just not needed if you just want to do code if you're just looking to get something written and you saw that how much it saved we're talking $334 down to 40 49 and we got a more functional version of the game the next thing is cancel subscriptions and start consolidating into really one llm router service requestyoutube on this video which is open web UI so you can visit GitHub and grab open web UI this is probably the best open-source way to consolidate your models it's incredibly easy to run either with python or Docker and ultimately I have it set up here and running the way you can configure this to work with requeste is you go to settings you go to admin settings you go to connections and basically put in the requeste URL here and your API key and you're done I've been testing it out to use this in place of like claw. a or chat. gp.com and you get access to all the models they're all available I even have my olama ones hooked in here too this is chat gbt 4.5 just to show you how this works I've selected the open AI one here all running through the requestyoutube that's it I'm going to be using open web UI to as part of that consolidation I'm actually going to try to deploy it I've got a computer behind me over here needs to be upgraded but I'm going to try to deploy it there and open it up to my entire house so my kids can use it my wife can use it I want to get my wife using AI if I can do that we know we're winning anyway I hopefully it's been helpful and if it has let me know in the comments below if there's something that you do different to save money while coding I would love to know that as well also check out reques you can get those six free credits again I'm not sponsored by them although full trpy they did give me some credits just to try out the service but I've really come to like it help me out with the algorithm give me a like And subscribe otherwise guys I appreciate you all we're been having so much fun making uh videos for this and the Discord channel is just amazing I love conversing with all of you sharing ideas sharing projects it's just amazing I've learned a lot from you guys too so jump in if you want to join that Community otherwise I'll see you in the next one peace out