We just had a realization during a demo call the other day:
The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up. Not being dependent on LLMs for your fundamental product’s value will be a major advantage, at least in pricing.
andersmurphy 6 hours ago [-]
Yup. Also regardless of price they need to spend more and more as the project collapses under the inevitable incidental complexity of 30k lines of code a day.
It's similar to how if you know what you're doing you can manage a simple VPS and scale a lot more cost effectively than something like vercel.
In a saturated market margins are everything. You can't necessarily afford to be giving all your margins to anthropic and vercel.
prox 3 hours ago [-]
I also can’t wait for the time when few know how to code. Just like how many folks don’t know html from css when the homebrew website went away.
Their might always be llms, but the dependence is an interesting topic.
Cthulhu_ 1 hours ago [-]
Look no further to be honest; look at older generation programming languages like COBOL and how sought-after good developers for that language are.
But I'm also afraid / certain that LLMs are able to figure out legacy code (as long as enough fits in their context window), so it's tenuous at best.
Also, funny you mentioned HTML / CSS because for a while (...in the 90's / 2000's) it looked like nobody needed to actually learn those because of tools like Dreamweaver / Frontpage.
zozbot234 4 hours ago [-]
> The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up.
It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago. For the very largest models, I think the latter effect dominates quite easily.
lelanthran 10 minutes ago [-]
>> The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up.
> It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago.
I agree; I got some coding value out of Qwen for $10/m (unlimited tokens); a nice harness (and some tight coding practices) lowers the distance between SOTA and 6mo second-tier models.
If I can get 80% of the way to Anthropic's or OpenAI's SOTA models using 10$/m with unlimited tokens, guess what I am going to do...
bcjdjsndon 2 hours ago [-]
There's only so far engineers can optimise the underlying transformer technique, which is and always has been doing all the heavy lifting in the recent ai boom. It's going to take another genius to move this forward. We might see improvements here and there but the magnitudes of the data and vram requirements I don't think will change significantly
zozbot234 2 hours ago [-]
State space models are already being combined with transformers to form new hybrid models. The state-space part of the architecture is weaker in retrieving information from context (can't find a needle in the haystack as context gets longer, the details effectively get compressed away as everything has to fit in a fixed size) but computationally it's quite strong, O(N) not O(N^2).
chewz 59 minutes ago [-]
We are processing same data for the last 2 years.
Inference prices droped like 90 percent in that time (a combination of cheaper models, implicit caching, service levels, different providers and other optimizations).
Quality went up. Quantity of results went up. Speed went up.
Service level that we provide to our clients went up massively and justfied better deals. Headcount went down.
What's not to like?
oeitho 38 minutes ago [-]
The decline of independent thoughts for one. As people become reliant on LLMs to do their thinking for them and solve all problems that they stumble upon, they become a shell of their previous self.
Sadly, this is already happening.
WarmWash 31 minutes ago [-]
We'll need to do faux mental work like how we do faux labor work.
accrual 2 hours ago [-]
> Not being dependent on LLMs for your fundamental product’s value
I think more specifically not being dependent on someone else's LLM hardware. IMO having OSS models on dedicated hardware could still be plenty viable for many businesses, granted it'll be some time before future OSS reaches today's SOTA models in performance.
Cthulhu_ 1 hours ago [-]
That'll be (part of) the big market correction, but also speaking broadly; as investor money dries up and said investors want to see results, many new businesses or products will realise they're not financially viable.
On a small scale that's a tragedy, but there's plenty of analysts that predict an economic crash and recession because there's trillions invested in this technology.
michaelbuckbee 5 hours ago [-]
What's weird though is the bifurcation in pricing in the market: aka if your app can function on a non-frontier level AI you can use last years model at a fraction of the cost.
bjornroberg 1 hours ago [-]
I wonder if it could be that they won't because the real mechanism is that AI wrapper pricing power is weak (switching costs near zero) but state of the art models makes it difficult to lower prices due to higher cost.
onion2k 2 hours ago [-]
The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up
Or they'll price the true cost in from the start, and make massive profits until the VC subsidies end... I know which one I'd do.
andersmurphy 9 minutes ago [-]
We don't know what anthropic's true costs are. So pricing that in is at best a guess.
sevenzero 2 hours ago [-]
This was as clear as the sky when the first llm based businesses popped up. How did you realize this only now?
And I don't really mean new businesses that are entirely built around LLMs, rather existing ones that pivoted to be LLM-dependent – yet still have non-LLM-dependent competitors.
sevenzero 2 hours ago [-]
Yea that would've been extremely short sighted from your competitors. Thanks for linking the response!
bdangubic 2 hours ago [-]
same as Uber… in the beginning everyone pretty much new that the cost of rides cannot possibly be that cheap and that it is subsudized. once you corner the market etc people just got used to “real” prices to the poibt that now there are often cheaper alternatives than Uber but people still Uber…
sevenzero 2 hours ago [-]
Its also quite interesting to read about Uber exploits their drivers and discriminating algorithms. Cory Doctorow mentioned it in his latest book, sadly cant link the direct sources.
muppetman 5 hours ago [-]
No shit. People are just figuring this out now?
This is the “Building my entire livelihood on Facebook, oh no what?” all over again.
Oh no sorry I forgot, your laptops LLM can draw a potato, let me invest in you.
lioeters 4 hours ago [-]
Indeed, it was clear from the beginning, "AI" companies want to become infrastructure and a critical dependency for businesses, so they can capture the market and charge whatever they want. They will have all the capital and data needed to eventually swallow those businesses too, or more likely sell it to anyone who wants the competitive advantage.
rybosworld 51 minutes ago [-]
Seriously.
> We just had a realization during a demo call the other day
These tools have been around for years now. As they've improved, dependency on them has grown. How is any organization only just realizing this?
That's like only noticing the rising water level once it starts flooding the second floor of the house.
finaard 5 hours ago [-]
How is that surprising? We've been taking that into account for any LLM related tooling for over a year now that we either can drop it, or have it designed in a way that we can switch to a selfhosted model when throwing money at hardware would pay for itself quickly.
It's just another instance of cloud dependency, and people should've learned something from that over the last two decades.
keiferski 3 hours ago [-]
Not so much that it was surprising, rather that we looked at a competitor’s site and noticed that a) their prices went way up and b) their branding changed to be heavily AI-first.
So we thought, hmm, “wonder if they are increasing prices to deal with AI costs,” and then projected that into a future where costs go up.
We don’t have this dependence ourselves, so this seems to be a competitive advantage for us on pricing.
strife25 4 hours ago [-]
Marginal costs matter in this world.
anonyfox 3 hours ago [-]
in fact I am betting opposite. frontier models are getting not THAT much better anymore at all, for common business needs at least. but the OSS models keep closing the gap. which means if trajectories hold there will be a near future moment probably where the big provider costs suddenly drop shaerply once the first viable local models consistently can take over tasks normally on reasonable hardware. Right now probably frontier providers rush for as much money as they possible can before LLMs become a true commodity for the 80% usecases outside of deep expert areas they will have an edge over as specialist juggernauts (iE a cybersecurity premium model).
So its all a house of cards now, and the moment the bubble bursts is when local open inference has closed the gap. looks like chinese and smaller players already go hard into this direction.
zozbot234 2 hours ago [-]
Local open inference can address hardware scarcity by repurposing the existing hardware that users need anyway for their other purposes. But since that hardware is a lot weaker than a proper datacenter setup, it will mostly be useful for running non-time-critical inference as a batch task.
Many users will also seek to go local as insurance against rug pulls from the proprietary models side (We're not quite sure if the third-party inference market will grow enough to provide robust competition), but ultimately if you want to make good utilization of your hardware as a single user you'll also be pushed towards mostly running long batch tasks, not realtime chat (except tiny models) or human-assisted coding.
michaelje 5 hours ago [-]
Absolutely. Pricing exposure is the quiet story under all the waves of AI hype. Build for convenience → subsidise for dependence → meter for margin is a well-worn playbook, and AI-dependent companies are about to find out what phase three feels like.
Hyperscalers are spending a fortune so we think AI = API, but renting intelligence is a business model, not a technical inevitability.
Not really, the next move is to establish standards groups requiring the use of AI in product development. A mix of industry and governmental mandates. What you view are viewing as COGS instead becomes instead a barrier to entry.
dmazin 17 hours ago [-]
Constraints can lead to innovation. Just two things that I think will get dramatically better now that companies have incentive to focus on them:
* harness design
* small models (both local and not)
I think there is tremendous low hanging fruit in both areas still.
com2kid 17 hours ago [-]
China already operates like this. Low cost specialized models are the name of the game. Cheaper to train, easy to deploy.
The US has a problem of too much money leading to wasteful spending.
If we go back to the 80s/90s, remember OS/2 vs Windows. OS/2 had more resources, more money behind it, more developers, and they built a bigger system that took more resources to run.
Mac vs Lisa. Mac team had constraints, Lisa team didn't.
Unlimited budgets are dangerous.
tasoeur 7 hours ago [-]
Though I do agree with you, I just came back from a trip to China (Shanghai more specifically) and while attending a couple AI events, the overwhelming majority of people there were using VPNs to access Claude code and codex :-/
coldtea 5 hours ago [-]
Parent's point was about deployment, not agentic coding.
busfahrer 5 hours ago [-]
> Low cost specialized models
Can you elaborate on this? Is this something that companies would train themselves?
phist_mcgee 8 hours ago [-]
Perhaps its because american hyperscalers want unlimited upside for their capital?
aldanor 2 hours ago [-]
Yep.
As a recent example in AI space itself. China had scarce GPU resources, quite obvious why => DeepSeek training team had to invent some wheels and jump through some hoops => some of those methods have since become 'industry standard' and adopted by western labs who are now jumping through the same hoops despite enjoying massive computeresources, for the sake of added efficiency.
cesarvarela 17 hours ago [-]
Harness is a big one, Claude Code still has trouble editing files with tabs. I wonder how many tokens per day are wasted on Claude attempting multiple times to edit a file.
lpcvoid 8 hours ago [-]
The future is now, I guess
drra 7 hours ago [-]
Absolutely. Anyone working on inference token level knows how wasteful it all is especially in multimodal tokens.
dataviz1000 16 hours ago [-]
What do you mean by harness here?
Ifkaluva 16 hours ago [-]
When you go to the command line and type “Claude”, there is an LLM, and everything else is the harness
dataviz1000 16 hours ago [-]
I'm having an hard time getting my mind to see this.
> Users should re-tune their prompts and harnesses accordingly.
I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion.
Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'.
dboreham 7 hours ago [-]
This field is chock full of people using terms incorrectly, defining new words for things that already had well known names, overloading terms already in use. E.g. shard vs partition. TUI which already meant "telephony user interface ". "Client" to mean "server" in blockchain.
16 hours ago [-]
ElFitz 6 hours ago [-]
It’s the tool that calls the model, give it access to the local file system, calls the actual tools and commands for the model, etc, and provide the initial system prompt.
Basically a clever wrapper around the Anthropic / OpenAI / whatever provider api or local inference calls.
codybontecou 16 hours ago [-]
pi vs. claude code vs. codex
These are all agent harnesses which run a model (in pi's case, any model) with a system prompt and their own default set of tools.
christkv 8 hours ago [-]
Could not agree more, this will spur innovation in all aspects of local models is my hunch.
KaiserPro 4 hours ago [-]
one graph, One graph and the author is pinning an entire theory on it?
Infra is always limited, even at hyper scalers. This leads to a bunch of tools dfofr caching, profiling and generally getting performance up, not to mention binpacking and all sorts of other "obvious" things.
sph 15 minutes ago [-]
One graph, about 100 words, AI in the title: Hacker News front page.
Not bad for a coffee break of effort.
losvedir 1 hours ago [-]
> Infra is always limited, even at hyper scalers
I think maybe infra is limited only at hyperscalers. For the rest of us it's just how much capacity to we want to rent from the hyperscalars.
It's kind of a recent cloud-native mindset, since back in the day when you ran your own hardware scaling and capacity was always top of mind. Looks like AI compute might be like that again, for the time being.
malshe 3 hours ago [-]
On X I had seen him mostly posting memes so this post seems par for the course
0xbadcafebee 22 minutes ago [-]
This isn't the first time they've dealt with scarcity, there's been supply chain scarcity four times since 2000. Post-dotcom boom, CDMA scarcity, HDD/flash scarcity, Pandemic scarcity.
The scarcity isn't long-term. Like all manufactured products, they'll ramp up production and flood the market with hardware, people will buy too much, market will drop. Boom and bust.
We're also still in the bubble. Eventually markets will no longer bear the lack of productivity/profit (as AI isn't really that useful) and there will be divestment and more hardware on the market as companies implode. Nobody is making 10x more from AI, they are just investing in it hoping for those profits which so far I don't think anyone has seen, other than in the companies selling the AI to other companies.
But more importantly, the models and inference keeps getting more efficient, so less hardware will do more in the future. We already have multiple models good enough for on-device small-scale work. In 5 years consumer chips and model inference will be so good you won't need a server for SOTA. When that happens, most of the billions invested in SOTA companies will disappear overnight, which'll leave a sizeable hole in the market.
siliconc0w 52 minutes ago [-]
Definitely feeling this - the subsidized subscription plans are already starting to buckle.
wg0 16 hours ago [-]
There's other side to it too.
Whoever running and selling their own models with inference is invested into the last dime available in the market.
Those valuations are already ridiculously high be it Anthropic or OpenAI to the tune of couple of trillion dollars easily if combind.
All that investment is seeking return. Correct me if I'm wrong.
Developers and software companies are the only serious users because they (mostly) review output of these models out of both culture and necessity.
Anywhere else? Other fields? There these models aren't any useful or as useful while revenue from software companies by no means going to bring returns to the trillion dollar valuations. Correct me if I'm wrong.
To make the matter worst, there's a hole in the bucket in form of open weight models. When squeezed further, software companies would either deploy open weight models or would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives, whole careers are built on that. Correct me if I'm wrong.
Eventually - ROI might not be what VCs expect and constant losses might lead to bankruptcies and all that build out of data centers all of sudden would be looking for someone to rent that compute capacity result of which would be dime a dozen open weight model providers with generous usage tiers to capitalize on that available compute capacity owners of which have gone bankrupt and can't use it any more wanting to liquidate it as much as possible to recoup as much investment as possible.
EDIT: Typos
solenoid0937 9 hours ago [-]
OpenAI has an absurdly high valuation given their cash burn vs RRR.
Anthropic's is far more reasonable.
It makes no sense to lump these two companies together when talking about valuation. They have completely different financial dynamics
wg0 8 hours ago [-]
No matter how low and reasonably Anthropic is valued, don't think $200 Max plans are going to recoup the investment + some return on top because size of the software industry is not that huge and profit margins for AI inference aren't very high either.
ElFitz 6 hours ago [-]
> because size of the software industry is not that huge
I onboarded marketing on a premium team Claude seat yesterday. And one of our sales vibecoded an internal tool in the last three weeks using Claude Code that they now use every day. I wouldn’t have imagined it a month ago. We still had to take care of deployment for him, but things are moving fast.
solenoid0937 8 hours ago [-]
Pro and Max plans are probably a drop in the bucket for them.
drra 7 hours ago [-]
Seems like everybody an their mothers are using max plans these days. I wouldn't be surprised if LTV of each customer was big enough to justify spending.
wg0 6 hours ago [-]
Assuming there are 10 million developers and everyone is at $200 max plan, that would be $2 billion/month or $24 billion/year maximum.
Note - this is just the revenue not the profit. No salaries, no compute paid for. Just plain revenue. Profit would be way less.
But even that - if we take it to $24 billion/year and we take a 10x multiple, the company is barely valued at $240 billon dollar, lets be generous and make it double at $480 billion and then round it up to $500 billion for a nice round number.
Far far from the $800 billion valuation Anthropic is looking at.
Only a matter of time.
EDIT: Fixed math
steveklabnik 2 hours ago [-]
Companies are spending far more than $200/month/developer. The $200 Max plan is a great value but you hit limits far too soon, and it also doesn't cover any of the other styles of integrations and tools that you can build and use to help your developers, like code review suggestions, which at the very least would come from additional Max plans, and not from the individual developers' plans.
solenoid0937 31 minutes ago [-]
Pro and Max plans are a tiny fraction of their revenue. Many businesses are spending thousands of dollars per head per month.
billziss 6 hours ago [-]
While I agree with you that AI companies are overvalued, I think 10 million developers at $200 per month makes 2 billion.
>>> f"{10_000_000 * 200:_}"
'2_000_000_000'
wg0 5 hours ago [-]
Thanks for pointing out. I updated the comment.
classified 2 hours ago [-]
> would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives
Shush, don't tell that to the AI coding acolytes.
christkv 8 hours ago [-]
It feels like a repeat of the dot com infrastructure buildup that spurred the whole 2005 explosion in affordable hosting and new companies. This will probably leave us massive access to affordable compute in a couple of years.
sdevonoes 5 hours ago [-]
It’s time to be AI-independent. It’s like AWS, for most of us, it’s not worth it.
latentframe 2 hours ago [-]
This isn’t really looking like AI scarcity it’s more like compute becoming the bottleneck : when the access depends on chips energy and capital it stops being a pure software game and the winners are often whoever can secure capacity first
mystraline 2 minutes ago [-]
And folks are just now realizing the SaaS token provider rug-pull?
How convenient, especially since everything has some LLM slop interaction.
But that rug isnt going to pull itself!
frigg 2 hours ago [-]
The models have already plateaued, you don't need latest and greatest.
2001zhaozhao 15 hours ago [-]
AKA, the beginning of big companies being able to roll over small companies with moar money
(note: I don't expect this to actually happen until the AI gets good enough to either nearly entirely replace humans or solve cooperation, but the long term trend of scarce AI will go towards that direction)
ttul 9 hours ago [-]
Energy scarcity will drive more innovation in local silicon and local inference. Apple will be the unexpected beneficiary of this reality.
henry2023 17 hours ago [-]
The US is bound by energy and China is bound by compute power. The one who solves its limitation first will end this “Scarcity Era”.
jakeinspace 17 hours ago [-]
China is installing something like 500 GW of wind and solar per year now. Even if they're only able to build and otherwise access chips that have half the SoTA performance per watt, they will win.
odo1242 16 hours ago [-]
Performance per dollar may be more important than performance per watt here, though
thelastgallon 9 hours ago [-]
A dollar is an entirely fictional unit and trillions of it can be manufactured at no cost, while watts are constrained by the laws of physics, photons/electrons, supply chain of electricity and all that fun stuff in the real world.
jerf 43 minutes ago [-]
A dollar is still a useful unit as "the fraction of the economy that can be controlled by currency". It's true that printing a huge pile of it and throwing it at GPUs wouldn't instantly convert into more GPUs, but it would meaningfully represent that other things are being squeezed out to allocate more resources to GPU production even so. That such reallocation is inefficient, arguably immoral, and highly questionable in the long term versus other options wouldn't stop that from being ture.
ElFitz 6 hours ago [-]
> A dollar is an entirely fictional unit and trillions of it can be manufactured at no cost
It’s still a useful proxy for resources allocation and viability.
tucnak 3 hours ago [-]
..unless you're actually reasoning at nation-scale where OP's points apply
ElFitz 2 hours ago [-]
I wouldn’t agree. Even at national scale, these projects cost resources. And the resources of all agents (org, countries) are constrained.
While we could reason in "performance / watt" and "performance / people", "performance / whatever other resource involved", and "performance / opportunity cost of allocating these resources to this use case and not another", "performance / whatever unit of stable-ish currency" is a convenient and often "good enough" approximation that somewhat encapsulates them all.
A simplification, like any model, but still useful.
thelastgallon 9 hours ago [-]
US energy is constrained by the utility monopolies/oligopolies which have to extract more rents, specifically by increasing costs. Their profit is a percentage of cost, these perverse incentives + oligopolies will make it increasingly expensive to make anything (including AI) in US.
hvb2 7 hours ago [-]
Or simply by the fact that increasing production takes time? Any power plant takes years to build?
Years, is like a lifetime for AI at this point...
dyauspitr 17 minutes ago [-]
Not solar. China and to a lesser extent India are pumping out huge solar farms in months.
thelastgallon 6 hours ago [-]
> increasing production takes time?
This is true of nearly everything (except money). I'm not sure of the point you are trying to make.
Miraste 16 hours ago [-]
China's domestic chips are increasingly close to state-of-the-art. The US electrical grid is... not.
CuriouslyC 16 hours ago [-]
The dynamics vastly favor China, part of the reason the US sprinting towards "ASI" isn't totally boneheaded is that the US and its industry needs a hail mary play to "win" the game, if they play it safe they lose for sure.
leptons 16 hours ago [-]
I'd be fine with a world without AI, honestly. Nobody really wins this race except the very wealthy. And I don't think it's really going to play out the way the wealthy think it will. It's more like a dog catching a car than it is a race.
odo1242 16 hours ago [-]
> It's more like a dog catching a car than it is a race.
What does this mean? I didn't understand the analogy.
digitalsushi 14 hours ago [-]
A car caught by a dog has no purpose. The activity concludes with no output.
leptons 9 hours ago [-]
"The dog that caught the car" refers to how dogs sometimes chase cars. Suppose the car stops and the dog catches up - what is it going to do? It has no plan, it has no purpose, it isn't going to bite the car, it isn't going to get anything out of catching the car. The car may even run it over. I intended it basically as "play stupid games, win stupid prizes", or "be careful what you wish for".
thelastgallon 9 hours ago [-]
My observation is that the dog sniffs all the tires, picks one tire, lifts one leg and does the deed. I don't know if its a way of marking territory or domination. We need a dogatologist to explain what it means.
ElFitz 6 hours ago [-]
That was quite the unexpected anticlimactic ending. I’m sure Terry Pratchett would be proud.
1828838383 3 hours ago [-]
We did it reddit!
utopiah 7 hours ago [-]
Initially I thought "Well... good for AI companies because they can then charge more" but IMHO that's a very tricky position because it means the cheap wave is behind us.
It's one thing to "sell" free or symbolically cheap stuff, it's another to have an actual client who will do the math and compare expenditure vs actually delivered value.
classified 2 hours ago [-]
> and compare expenditure vs actually delivered value
Which means that the hype production will be driven up another few notches to make people doubt their rational findings and keep them in irrational territory just a tad longer. Every minute converts to dollars spent on tokens.
chatmasta 3 hours ago [-]
Why is written with an assumption that we have finite hardware production capacity? Industrial processes can scale up, new factories can come online… it will take a while but the whole point of economics is that supply will scale to meet demand. The shortage is a temporary, point-in-time metric.
And that’s not considering the software innovation that can happen in the meantime.
Bengalilol 3 hours ago [-]
The economic hypothesis that has dominated the past hundred years is that economic growth is infinite because resources are infinite and (almost) free. We all know this is unrealistic and disconnected from our human condition.
Regarding "innovation", I agree with your idea. I even think that the major innovation will be to transpose models locally, using reduced infrastructures that will still be sufficient for the majority of use cases.
tim333 6 hours ago [-]
>For the first time since the 2000s, technology companies are confronting the limits of their supply chain.
I thought there'd been a shortage of cheap GPUs since ChatGPT took off and also before that in various crypto booms. I'm not sure it's a new thing.
the_gipsy 5 hours ago [-]
But that concerned mostly only gamers and cryptominers. AI is supposed to be replacing traditional software development, which affects everything.
bcjdjsndon 2 hours ago [-]
Neither is this the first time nor are they really confronting it
vessenes 17 hours ago [-]
It seems very possible that we have at least five years of real limitations on compute coming up. Maybe ten, depending on ASML. I wonder what an overshoot looks like. I also wonder if there might be room for new entrants in a compute-scarce environment.
For instance, at some point, could Coreweave field a frontier team as it holds back 10% of its allocations over time? Pretty unusual situation.
dist-epoch 16 hours ago [-]
Jensen just said that if the signal/commitments are there, ASML can scale in 2-3 years.
vessenes 12 hours ago [-]
With Anthropic buying compute in dark alleys I’d assume that day is coming..
com2kid 17 hours ago [-]
To bang on the same damn drum:
Open Weight models are 6 months to a year behind SOTA. If you were building a company a year ago based on what AI could do then, you can build a company today with models that run locally on a user's computer. Yes that may mean requiring your customers to buy Macbooks or desktops with Nvidia GPUs, but if your product actually improves productivity by any reasonable amount, that purchase cost is quickly made up for.
I'll argue that for anything short of full computer control or writing code, the latest Qwen model will do fine. Heck you can get a customer service voice chat bot running in 8GB of VRAM + a couple gigs more for the ASR and TTS engine, and it'll be more powerful than the hundreds of millions spent on chat bots that were powered by GPT 4.x.
This is like arguing the age of personal computing was over because there weren't enough mainframes for people to telnet into.
It misses the point. Yes deployment and management of personal PCs was a lot harder than dumb terminal + mainframe, but the future was obvious.
16 hours ago [-]
space_fountain 16 hours ago [-]
I've seen this claimed, but I'm not sure it's been true for my use cases? I should try a more involved analysis but so far open models seem much less even in their skills. I think this makes sense if a lot of them are built based on distillations of larger models. It seems likely that with task specific fine tuning this is true?
rstuart4133 12 hours ago [-]
> I've seen this claimed, but I'm not sure it's been true for my use cases?
I'd be surprised if it isn't true for your use cases. If you give GLM-5.1 and Optus 4.6 the same coding task, they will both produce code that passes all the tests. In both cases the code will be crap, as no model I've seen produces good code. GLM-5.1 is actually slightly better at following instructions exactly than Optus 4.6 (but maybe not 4.7 - as that's an area they addressed).
I've asked GLM-5.1 and Opus 4.6 to find a bug caused by a subtle race condition (the race condition leads to a number being 15172580 instead of 15172579 after about 3 months of CPU time). Both found it, in a similar amount of time. Several senior engineers had stared at the code for literally days and didn't find it.
There is no doubt the models do vary in performance at various tasks, but we are talking the difference between Ferrari vs Mercedes in F1. While the differences are undeniable, this isn't the F1. Things take a year to change there. The performance of the models from Anthropic and OpenAI literally change day by day, often not due to the model itself but because of the horsepower those companies choose to give them on the day, or them tweaking their own system prompts. You can find no end of posts here from people screaming in frustration the thing that worked yesterday doesn't work today, or suddenly they find themselves running out of tokens, or their favoured tool is blocked. It's not at all obvious the differences between the open-source models and the proprietary ones are worse than those day to day ones the proprietary companies inflict on us.
frodowtf2 9 hours ago [-]
> In both cases the code will be crap, as no model I've seen produces good code.
I'm wondering if you have actually used claude code because results are not so catastrophic as you describe them.
rstuart4133 8 hours ago [-]
I used LLMs to write what seems like far too many lines of code now. This is an example Opus 4.6 running at maximum wrote in C:
If you don't know C, in older versions that can be a catastrophic failure. (The issue is so serious in modern C `free(NULL)` is a no-op.) If it's difficult to get a `FOO == NULL` without extensive mocking (this is often the case) most programmers won't do it, so it won't be caught by unit tests. The LLMs almost never get unit test coverage up high enough to catch issues like this without heavy prompting.
But that's the least of it. The models (all of them) are absolutely hopeless at DRY'ing out the code, and when they do turn it into spaghetti because they seem almost oblivious to isolation boundaries, even when they are spelt out to them.
None of this is a problem if you are vibe coding, but you can only do that when you're targeting a pretty low quality level. That's entirely appropriate in some cases of course, but when it isn't you need heavy reviews from skilled programmers. No senior engineer is going to stomach the repeated stretches of almost the "same but not quite" code they churn out.
You don't have to take my word for it. Try asking Google "do llm's produce verbose code".
random_human_ 7 hours ago [-]
Is foo a pointer in your example? Is free(NULL) not a valid operation?
rstuart4133 7 hours ago [-]
Yes `foo` is a pointer.
`free(NULL)` is harmless in C89 onwards. As I said, programmers freeing NULL caused so many issues they changed the API. It doesn't help that `malloc(0)` returns NULL on some platforms.
If you are writing code for an embedded platform with some random C compiler, all bets on what `free(NULL)` does are off. That means a cautious C programmer who doesn't know who will be using their code never allows NULL to be passed to `free()`.
In general, most good C programmers are good because they suffer a sort of PTSD from the injuries the language has inflicted on them in the past. If they aren't avoiding passing NULL to `free()`, they haven't suffered long enough to be good.
lelanthran 4 hours ago [-]
> That means a cautious C programmer who doesn't know who will be using their code never allows NULL to be passed to `free()`.
If your compiler chokes on `free(NULL)` you have bigger problems that no LLM (or human) can solve for you: you are using a compiler that was last maintained in the 80s!
If your C compiler doesn't adhere to the very first C standard published, the problem is not the quality of the code that is written.
> If they aren't avoiding passing NULL to `free()`, they haven't suffered long enough to be good.
I dunno; I've "suffered" since the mid-90s, and I will free NULL, because it is legal in the standard, and because I have not come across a compiler that does the wrong thing on `free(NULL)`.
random_human_ 6 hours ago [-]
So what would be the best practice in a situation like that? I would (naively?) imagine that a null pointer would mostly result from a malloc() or some other parts of the program failing, in which case would you not expect to see errors elsewhere?
rstuart4133 3 hours ago [-]
> imagine that a null pointer would mostly result from a malloc() or some other parts of the program failing, in which case would you not expect to see errors elsewhere?
Oh yes, you probably will see errors elsewhere. If you are lucky it will happen immediately. But often enough millions of executed instructions later, in some unrelated routine that had its memory smashed. It's not "fun" figuring out what happened. It could be nothing - bit flips are a thing, and once you get the error rate low enough the frequency of bit flips and bugs starts to converge. You could waste days of your time chasing an alpha particle.
I saw the author of curl post some of this code here a while back. I immediately recognised the symptoms. Things like:
if (NULL == foo) { ... }
Every 2nd line was code like that. If you are wondering, he wrote `(NULL == foo)` in case he dropped an `=`, so it became `(NULL = foo)`. The second version is a syntax error, whereas `(foo = NULL)` is a runtime disaster. Most of it was unjustified, but he could not help himself. After years of dealing with C, he wrote code defensively - even if it wasn't needed. C is so fast and the compilers so good the coding style imposes little overhead.
Rust is popular because it gives you a similar result to C, but you don't need to have been beaten by 10 years of pain in order to produce safe Rust code. Sadly, it has other issues. Despite them, it's still the best C we have right now.
incrudible 7 hours ago [-]
C is fundamentally a bad target for LLMs. Humans get C wrong all the time, so we can not hope the nascent LLM, which has been trained on 95% code that does automatic memory management, to excel here.
I always found myself writing verbose copypasta code first, then compress it down based on the emerging commonalities. I think doing it the other way around is likely to lead to a worse design. Can you not tell the LLM to do the same? Honest question.
rstuart4133 6 hours ago [-]
> I always found myself writing verbose copypasta code first, then compress it down based on the emerging commonalities. I think doing it the other way around is likely to lead to a worse design.
I do pretty much the same thing, which is to say I "write code using a brain dump", "look for commonalities that tickle the neurons", then "refactor". Lather, rinse, and repeat until I'm happy.
> Can you not tell the LLM to do the same?
You can tell them until you're blue in the face. They ignore you.
I'm sure this is a temporary phase. Once they solve the problem, coding will suffer the same fate as blacksmiths making nails. [0] To solve it they need to satisfy two conflicting goals - DRY the code out, while keeping interconnections between modules to a minimum. That isn't easy. In fact it's so hard people who do it well and can do it across scales are called senior software engineers. Once models master that trick, they won't be needed any more.
By "they" I mean "me".
[0] Blacksmiths could produce 1,000 or so a day, but it must have been a mind-numbing day even if it paid the bills. Then automation came along, and produced them at over a nail per second.
lelanthran 4 hours ago [-]
> C is fundamentally a bad target for LLMs.
I found it exceptionally good, because:
a) The agent doesn't need to read the implementation of anything - you can stuff the entire projects headers into the context and the LLM can have a better birds-eye view of what is there and what is not, and what goes where, etc.
and
b) Enforcing Parse, don't Validate using opaque types - the LLM writing a function that uses a user-defined composite datatype has no knowledge of the implementation, because it read only headers.
com2kid 16 hours ago [-]
What are you trying to do?
Write code? No. Use frontier models. They are subsidized and amazing and they get noticably better ever few months.
Literally anything else? Smaller models are fine. Classifiers, sentiment analysis, editing blog posts, tool calling, whatever. They go can through documents and extract information, summarize, etc. When making a voice chat system awhile back I used a cheap open weight model and just asked it "is the user done speaking yet" by passing transcripts of what had been spoken so far, and this was 2 years ago and a crappy cheap low weight model. Be creative.
I wouldn't trust them to do math, but you can tool call out to a calculator for that.
They are perfectly fine at holding conversations. Their weights aren't large enough to have every book ever written contained in them, or the details of every movie ever made, but unless you need that depth and breadth of knowledge, you'll be fine.
space_fountain 15 hours ago [-]
I just mean is the claim that the open source models where the closed models were 12 to 6 months ago true? They do seem to be for some specific tasks which is cool, but they seem even more uneven in skills than the frontier model. They're definitely useful tools, but I'm not sure if they're a match for frontier models from a year ago?
com2kid 14 hours ago [-]
Frontier models from a year ago had issues with consistent tool calling, instruction following was pretty good but could still go off the rails from time to time.
Open weight models have those same issues. They are otherwise fine.
You can hook them up to a vector DB and build a RAG system. They can answer simple questions and converse back and forth. They have thinking modes that solve more complex problems.
They aren't going to discover new math theorems but they'll control a smart home and manage your calendar.
dyauspitr 13 minutes ago [-]
That’s nonsense. Local models don’t have any of the nuance in text responses. I find them more akin to GPT 3.5 than even 4.x
dist-epoch 16 hours ago [-]
Buy new Macs from where? There is a shortage of RAM, SSD, GPUs, and the CPU shortage just started.
ethan_smith 16 minutes ago [-]
[dead]
Bengalilol 3 hours ago [-]
... and I have this little idea in the back of my mind: when companies can no longer keep up with demand and people have (albeit more limited and reduced) local capacity, minds will start focusing on techniques (more humble and modest ones) to keep part of the system running locally, without dependency.
I know it may sound ridiculous, but it could actually become a way to break away from the business models that have been developed over the past few decades. Broadly speaking, this even amounts to saying that the biggest victims of AI could be the companies that bet on AI as a service.
Yet I know my vision is way too idealistic but I'm coming to imagine that a human brain, although less efficient in the long run, remains a reliable way to control the resulting costs and could even turn out to be more advantageous and more readily available than its silicon-based counterpart.
20after4 2 hours ago [-]
The human brain is incredibly efficient (Approximately 20W of energy consumption¹). These AI systems use many orders of magnitude more energy than human equivalents.
Well it's in the books. O(n^2) algorithms are bad in the long run, transformers algorithm has such complexity, so not a big surprise we hit the limits.
stupefy 17 hours ago [-]
What limits LLM inference accelerators? I heard about Groq (https://groq.com/) not sure how much it pushes away the problem.
vessenes 17 hours ago [-]
ASML only makes a certain number of machines a year that can do extreme ultra-violet lithography.
Also - turbine blades limit power, according to Elon.
Between them - we cannot chip fabs past a certain rate, and we cannot stand up the datacenter to run these desired chips past a certain rate. Different people believe one or the other is the 'true' current bottleneck. The turbine supply chain scaling looks much more tractable -- EUV is essentially the most complicated production process humans have ever devised.
utopiah 7 hours ago [-]
Is ASML really the bottleneck? Do you believe anybody but TSMC and few fabs could really use and acquire those machines? I don't know the throughput of a EUV device from ASML but I imagine you need :
- clean room, itself needing the infrastructure for it (size, airCo, filtering, electricity) and the staff to run and maintain that basically empty space
- wafers to "print" on, so that's a lot of water and logistic to manipulate them (so infrastructure for clean water and all chemicals) also with dedicated staff
- finally staff who would be able to design something significantly better than NVIDIA, Intel, Broadcom, IBM, etc while (and arguably that's the trickiest part IMHO) being able to get it good enough as at a scale that can be manufactured from their own fab.
so I'm wondering who can afford this kind of setup that can only then make use of ASML machines.
Marazan 7 hours ago [-]
> (so infrastructure for clean water and all chemicals)
Fabs are some of the most complex chemical engineering sites (dealing with some of the most dangerous substances) in the world. So don't underestimate the complexity of this part.
utopiah 3 hours ago [-]
Well that was part of my point, not everybody is TSMC. It's not "just" getting an ASML machine and voila, you're good to go.
17 hours ago [-]
andai 16 hours ago [-]
Is global compute bottlenecked by one company?
Tanjreeve 8 hours ago [-]
Yes. At least, the manufacturing of compute is. And a lot of the chain has been bitten hard by increasing capacity prematurely in the past so they're reticent to increase bandwidth at vast cost.
ls612 17 hours ago [-]
Presumably ASML can increase production if demand is high enough the question is over what time frame. 5 years seems plausible to me but I honestly don't know what that number is.
vessenes 17 hours ago [-]
It's ... really long, according to Dylan Patel on the Dwarkesh Podcast. The supply chain is extremely deep and complex.
juliansimioni 17 hours ago [-]
Yes. And the fab companies and their suppliers are deliberately and wisely slow to scale up production to meet short term changes in demand. They've seen the history of the semiconductor industry, it's constant boom and bust cycles. But they have the highest op-ex costs of anyone. So when the party's over they are the ones who pay for it the most.
Miraste 16 hours ago [-]
If only there were some form of cheap, widely manufactured power generation technology that didn't use turbines... Are they really going to wait until 2030 to get more turbines rather than invest in solar?
mattas 17 hours ago [-]
This notion that "we don't have enough compute" does not cleanly reconcile with the fact that labs are burning cash faster than any cohort of companies in history.
If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."
deepseasquid 1 hours ago [-]
The grocery store analogy works if compute is the orange.
But labs arent buying oranges — theyre buying the only orchard on the island, hoping it yields a fruit no ones grown yet. Burning $1B to net $500M isnt "I have too few oranges." Its "Im betting the farm Ill find a new one."
Both can be irrational. Theyre irrational in different ways.
FloorEgg 17 hours ago [-]
There is a major logic flaw in what you're saying.
'If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."'
How about 'if I'm a grocery store and I see no limit on demand for oranges at $.50 but they are currently $1, I can say 'if oranges were cheaper I could sell orders of magnitude more of them'.
Buying oranges for $1 and selling for $0.5 is an investment into acquiring market share and customer relationships and a gamble on the price of oranges falling in the future.
0x3f 17 hours ago [-]
> acquiring market share and customer relationships
The whole setup rests on this, and it seems mythical to me. These guys have basically equivalent products at this point.
16 hours ago [-]
eloisant 2 hours ago [-]
Selling below cost is also called "predatory pricing". Sadly it's legal in US but it's something wealthy companies do to kill competitors and end up with captive customers.
lelanthran 4 hours ago [-]
> Buying oranges for $1 and selling for $0.5 is an investment into acquiring market share and customer relationships
It's a delusion that customers are going to remain with the behemoths when a Qwen model run by an independent is $10/m, unlimited usage.
This is not a market that can be locked-in with network effects, and the current highly-invested players have no moat.
TeMPOraL 16 hours ago [-]
You can if you're exhausting the global production of oranges.
earthnail 17 hours ago [-]
If there were more oranges you’d pay less to buy them and your economics would work out.
0x3f 17 hours ago [-]
Not sure if this is a joke or not, but competitive pressure still exists. This only really holds if you're the only orange seller.
vessenes 17 hours ago [-]
You misunderstand.
"I built a ship to go to the Indies and bring back tea."
"Bro, the ship cost 100,000 pounds sterling and only brought back 50,000 pounds of tea. I don't care if you paid 12,500 pounds for the tea itself, you're losing money."
There is a very rational reason labs are spending everything they can get for more compute right now. The tea (inference) pays 60%+ margins. And that is rising. And that number is AFTER hyper scalars make their margins. There is an immense amount of profit floating around this system, and strategics at the edge believing they can build and control the demand through combined spend on training and inference in the proper ratios.
SpicyLemonZest 17 hours ago [-]
60%+ margins according to numbers which are not published publicly and have not AFAICT been audited.
Could they be accurate? Sure, I think people who claim this is impossible are overconfident. But I would encourage anyone who assumes they must be right to read a history of the Worldcom scandal. It's really quite easy for a person who wants to be making money (or an LLM who's been instructed to "run the accounts make no mistakes"!) to incorrectly categorize costs as capital investments when nobody's watching carefully.
czk 17 hours ago [-]
"adaptive" thinking
itmitica 17 hours ago [-]
The current inference system is on a down slope.
It remains to be seen what new wave of AI system or systems will replace it, making the whole current architecture obsolete.
Meanwhile, they are milking it, in the name of scarcity.
byyoung3 17 hours ago [-]
distillation is an equalizing force
eloisant 2 hours ago [-]
Distillation doesn't give you an equivalent model.
yalogin 16 hours ago [-]
Does this also mean ram prices are not coming down anytime soon?
i_think_so 9 hours ago [-]
> Does this also mean ram prices are not coming down anytime soon?
One person replies "yes". Another replies "no".
This concludes our press conference.
<3 HN
stronglikedan 16 hours ago [-]
they already are
dist-epoch 16 hours ago [-]
yes, and it will keep increasing
isawczuk 17 hours ago [-]
It's artificial scarcity.
LLM inference will soon be commodity as cloud.
There is a 2-3years still before ASIC LLM inferences will catch up.
observationist 17 hours ago [-]
The problem with this idea is that someone can, and likely will, come up with the next best architecture that leapfrogs the current frontier models at least once a year, likely faster, for the foreseeable future. This means by the time you've manufactured your LLM on an ASIC, it's 4-5 generations behind, and probably much less efficient than current SOTA model at scale.
It won't make sense for ASIC LLMs to manifest until things start to plateau, otherwise it'll be cheaper to get smarter tokens on the cloud for almost all use cases.
That said, a 10 trillion parameter model on a bespoke compute platform overcomes a lot of efficiency and FOOM aspects of the market fit, so the angle is "when will models that can be run on an asic be good enough that people will still want them for various things even if the frontier models are 10x smarter and more efficient"
I think we're probably a decade of iteration on LLMs out, at least, and the entire market could pivot if the right breakthrough happens - some GPT-2 moment demonstrating some novel architecture that convinces the industry to make the move could happen any time now.
vessenes 17 hours ago [-]
I don't think so. GB200 prices are GOING UP. A100s are still expensive. This implies massive utilization and demand, no? These machines are not sitting idle, or prices would drop in the very competitive hyperscaler environment.
Morromist 16 hours ago [-]
Hard to say at this point. I'm sure you can run your LLM chips 24/7 for training and for the public to make weird thirst-trap videos about Judy Hopps but how real is the utilization and demand, really? Maybe very real, maybe not, I don't think we can know yet.
Its like being back in 1850 and you build the world's first amusement park where the rides are free or very cheap. People are like Amusement parks are the next big thing since Steam Boats! And tons of other rich people start to build huge amusement parks everywhere. The people who are skilled at making amusement park rides will increase their prices, and since the first amusement parks are free so they can get the public going to them demand will be huge.
But how sustainable is that? - well obviously we know from history that amusement parks did, in fact, take over the world and most people spent virtually all their time and money at amusement parks - I think the Crimean War was even fought over some religious-based theme park in Israel - until moving pictures came out, so it worked out for them, but for AI?
LogicFailsMe 4 hours ago [-]
so much for all that hardware that was going to be obsolete in 3 years...
throwaway290 6 hours ago [-]
wasnt ai supposed to get us post-scarcity?
PessimalDecimal 3 hours ago [-]
That worked out, for the founders of frontier labs at least.
17 hours ago [-]
paulddraper 17 hours ago [-]
This is wrong along multiple axes.
1. Supply can scale. You can point to COVID/supply-chain shocks, but the problem there is temporary changes. No one spins up a whole fab to address a 3 month spike. Whereas AI is not a temporary demand change.
2. Models are getting more efficient. DeepSeek V3 was 1/10th the cost of contemporary ChatGPT. Open weight models get more runnable or smarter every month. Cutting edge is always cutting edge, but if scarcity is real, model selection will adjust to fit it.
Lapalux 17 hours ago [-]
"The first hit is free....."
hemangjoshi37a 4 hours ago [-]
[dead]
SadErn 17 hours ago [-]
[dead]
Rendered at 14:35:27 GMT+0000 (Coordinated Universal Time) with Vercel.
The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up. Not being dependent on LLMs for your fundamental product’s value will be a major advantage, at least in pricing.
It's similar to how if you know what you're doing you can manage a simple VPS and scale a lot more cost effectively than something like vercel.
In a saturated market margins are everything. You can't necessarily afford to be giving all your margins to anthropic and vercel.
Their might always be llms, but the dependence is an interesting topic.
But I'm also afraid / certain that LLMs are able to figure out legacy code (as long as enough fits in their context window), so it's tenuous at best.
Also, funny you mentioned HTML / CSS because for a while (...in the 90's / 2000's) it looked like nobody needed to actually learn those because of tools like Dreamweaver / Frontpage.
It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago. For the very largest models, I think the latter effect dominates quite easily.
> It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago.
I agree; I got some coding value out of Qwen for $10/m (unlimited tokens); a nice harness (and some tight coding practices) lowers the distance between SOTA and 6mo second-tier models.
If I can get 80% of the way to Anthropic's or OpenAI's SOTA models using 10$/m with unlimited tokens, guess what I am going to do...
Inference prices droped like 90 percent in that time (a combination of cheaper models, implicit caching, service levels, different providers and other optimizations).
Quality went up. Quantity of results went up. Speed went up.
Service level that we provide to our clients went up massively and justfied better deals. Headcount went down.
What's not to like?
Sadly, this is already happening.
I think more specifically not being dependent on someone else's LLM hardware. IMO having OSS models on dedicated hardware could still be plenty viable for many businesses, granted it'll be some time before future OSS reaches today's SOTA models in performance.
On a small scale that's a tragedy, but there's plenty of analysts that predict an economic crash and recession because there's trillions invested in this technology.
Or they'll price the true cost in from the start, and make massive profits until the VC subsidies end... I know which one I'd do.
And I don't really mean new businesses that are entirely built around LLMs, rather existing ones that pivoted to be LLM-dependent – yet still have non-LLM-dependent competitors.
This is the “Building my entire livelihood on Facebook, oh no what?” all over again.
Oh no sorry I forgot, your laptops LLM can draw a potato, let me invest in you.
> We just had a realization during a demo call the other day
These tools have been around for years now. As they've improved, dependency on them has grown. How is any organization only just realizing this?
That's like only noticing the rising water level once it starts flooding the second floor of the house.
It's just another instance of cloud dependency, and people should've learned something from that over the last two decades.
So we thought, hmm, “wonder if they are increasing prices to deal with AI costs,” and then projected that into a future where costs go up.
We don’t have this dependence ourselves, so this seems to be a competitive advantage for us on pricing.
So its all a house of cards now, and the moment the bubble bursts is when local open inference has closed the gap. looks like chinese and smaller players already go hard into this direction.
Many users will also seek to go local as insurance against rug pulls from the proprietary models side (We're not quite sure if the third-party inference market will grow enough to provide robust competition), but ultimately if you want to make good utilization of your hardware as a single user you'll also be pushed towards mostly running long batch tasks, not realtime chat (except tiny models) or human-assisted coding.
Hyperscalers are spending a fortune so we think AI = API, but renting intelligence is a business model, not a technical inevitability.
Shameless link to my post on this: https://mjeggleton.com/blog/AIs-mainframe-moment
* harness design
* small models (both local and not)
I think there is tremendous low hanging fruit in both areas still.
The US has a problem of too much money leading to wasteful spending.
If we go back to the 80s/90s, remember OS/2 vs Windows. OS/2 had more resources, more money behind it, more developers, and they built a bigger system that took more resources to run.
Mac vs Lisa. Mac team had constraints, Lisa team didn't.
Unlimited budgets are dangerous.
Can you elaborate on this? Is this something that companies would train themselves?
As a recent example in AI space itself. China had scarce GPU resources, quite obvious why => DeepSeek training team had to invent some wheels and jump through some hoops => some of those methods have since become 'industry standard' and adopted by western labs who are now jumping through the same hoops despite enjoying massive computeresources, for the sake of added efficiency.
> Users should re-tune their prompts and harnesses accordingly.
I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion.
Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'.
Basically a clever wrapper around the Anthropic / OpenAI / whatever provider api or local inference calls.
Infra is always limited, even at hyper scalers. This leads to a bunch of tools dfofr caching, profiling and generally getting performance up, not to mention binpacking and all sorts of other "obvious" things.
Not bad for a coffee break of effort.
I think maybe infra is limited only at hyperscalers. For the rest of us it's just how much capacity to we want to rent from the hyperscalars.
It's kind of a recent cloud-native mindset, since back in the day when you ran your own hardware scaling and capacity was always top of mind. Looks like AI compute might be like that again, for the time being.
The scarcity isn't long-term. Like all manufactured products, they'll ramp up production and flood the market with hardware, people will buy too much, market will drop. Boom and bust.
We're also still in the bubble. Eventually markets will no longer bear the lack of productivity/profit (as AI isn't really that useful) and there will be divestment and more hardware on the market as companies implode. Nobody is making 10x more from AI, they are just investing in it hoping for those profits which so far I don't think anyone has seen, other than in the companies selling the AI to other companies.
But more importantly, the models and inference keeps getting more efficient, so less hardware will do more in the future. We already have multiple models good enough for on-device small-scale work. In 5 years consumer chips and model inference will be so good you won't need a server for SOTA. When that happens, most of the billions invested in SOTA companies will disappear overnight, which'll leave a sizeable hole in the market.
Whoever running and selling their own models with inference is invested into the last dime available in the market.
Those valuations are already ridiculously high be it Anthropic or OpenAI to the tune of couple of trillion dollars easily if combind.
All that investment is seeking return. Correct me if I'm wrong.
Developers and software companies are the only serious users because they (mostly) review output of these models out of both culture and necessity.
Anywhere else? Other fields? There these models aren't any useful or as useful while revenue from software companies by no means going to bring returns to the trillion dollar valuations. Correct me if I'm wrong.
To make the matter worst, there's a hole in the bucket in form of open weight models. When squeezed further, software companies would either deploy open weight models or would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives, whole careers are built on that. Correct me if I'm wrong.
Eventually - ROI might not be what VCs expect and constant losses might lead to bankruptcies and all that build out of data centers all of sudden would be looking for someone to rent that compute capacity result of which would be dime a dozen open weight model providers with generous usage tiers to capitalize on that available compute capacity owners of which have gone bankrupt and can't use it any more wanting to liquidate it as much as possible to recoup as much investment as possible.
EDIT: Typos
Anthropic's is far more reasonable.
It makes no sense to lump these two companies together when talking about valuation. They have completely different financial dynamics
I onboarded marketing on a premium team Claude seat yesterday. And one of our sales vibecoded an internal tool in the last three weeks using Claude Code that they now use every day. I wouldn’t have imagined it a month ago. We still had to take care of deployment for him, but things are moving fast.
Note - this is just the revenue not the profit. No salaries, no compute paid for. Just plain revenue. Profit would be way less.
But even that - if we take it to $24 billion/year and we take a 10x multiple, the company is barely valued at $240 billon dollar, lets be generous and make it double at $480 billion and then round it up to $500 billion for a nice round number.
Far far from the $800 billion valuation Anthropic is looking at.
Only a matter of time.
EDIT: Fixed math
Shush, don't tell that to the AI coding acolytes.
How convenient, especially since everything has some LLM slop interaction.
But that rug isnt going to pull itself!
(note: I don't expect this to actually happen until the AI gets good enough to either nearly entirely replace humans or solve cooperation, but the long term trend of scarce AI will go towards that direction)
It’s still a useful proxy for resources allocation and viability.
While we could reason in "performance / watt" and "performance / people", "performance / whatever other resource involved", and "performance / opportunity cost of allocating these resources to this use case and not another", "performance / whatever unit of stable-ish currency" is a convenient and often "good enough" approximation that somewhat encapsulates them all.
A simplification, like any model, but still useful.
Years, is like a lifetime for AI at this point...
This is true of nearly everything (except money). I'm not sure of the point you are trying to make.
What does this mean? I didn't understand the analogy.
It's one thing to "sell" free or symbolically cheap stuff, it's another to have an actual client who will do the math and compare expenditure vs actually delivered value.
Which means that the hype production will be driven up another few notches to make people doubt their rational findings and keep them in irrational territory just a tad longer. Every minute converts to dollars spent on tokens.
And that’s not considering the software innovation that can happen in the meantime.
Regarding "innovation", I agree with your idea. I even think that the major innovation will be to transpose models locally, using reduced infrastructures that will still be sufficient for the majority of use cases.
I thought there'd been a shortage of cheap GPUs since ChatGPT took off and also before that in various crypto booms. I'm not sure it's a new thing.
For instance, at some point, could Coreweave field a frontier team as it holds back 10% of its allocations over time? Pretty unusual situation.
Open Weight models are 6 months to a year behind SOTA. If you were building a company a year ago based on what AI could do then, you can build a company today with models that run locally on a user's computer. Yes that may mean requiring your customers to buy Macbooks or desktops with Nvidia GPUs, but if your product actually improves productivity by any reasonable amount, that purchase cost is quickly made up for.
I'll argue that for anything short of full computer control or writing code, the latest Qwen model will do fine. Heck you can get a customer service voice chat bot running in 8GB of VRAM + a couple gigs more for the ASR and TTS engine, and it'll be more powerful than the hundreds of millions spent on chat bots that were powered by GPT 4.x.
This is like arguing the age of personal computing was over because there weren't enough mainframes for people to telnet into.
It misses the point. Yes deployment and management of personal PCs was a lot harder than dumb terminal + mainframe, but the future was obvious.
I'd be surprised if it isn't true for your use cases. If you give GLM-5.1 and Optus 4.6 the same coding task, they will both produce code that passes all the tests. In both cases the code will be crap, as no model I've seen produces good code. GLM-5.1 is actually slightly better at following instructions exactly than Optus 4.6 (but maybe not 4.7 - as that's an area they addressed).
I've asked GLM-5.1 and Opus 4.6 to find a bug caused by a subtle race condition (the race condition leads to a number being 15172580 instead of 15172579 after about 3 months of CPU time). Both found it, in a similar amount of time. Several senior engineers had stared at the code for literally days and didn't find it.
There is no doubt the models do vary in performance at various tasks, but we are talking the difference between Ferrari vs Mercedes in F1. While the differences are undeniable, this isn't the F1. Things take a year to change there. The performance of the models from Anthropic and OpenAI literally change day by day, often not due to the model itself but because of the horsepower those companies choose to give them on the day, or them tweaking their own system prompts. You can find no end of posts here from people screaming in frustration the thing that worked yesterday doesn't work today, or suddenly they find themselves running out of tokens, or their favoured tool is blocked. It's not at all obvious the differences between the open-source models and the proprietary ones are worse than those day to day ones the proprietary companies inflict on us.
I'm wondering if you have actually used claude code because results are not so catastrophic as you describe them.
But that's the least of it. The models (all of them) are absolutely hopeless at DRY'ing out the code, and when they do turn it into spaghetti because they seem almost oblivious to isolation boundaries, even when they are spelt out to them.
None of this is a problem if you are vibe coding, but you can only do that when you're targeting a pretty low quality level. That's entirely appropriate in some cases of course, but when it isn't you need heavy reviews from skilled programmers. No senior engineer is going to stomach the repeated stretches of almost the "same but not quite" code they churn out.
You don't have to take my word for it. Try asking Google "do llm's produce verbose code".
`free(NULL)` is harmless in C89 onwards. As I said, programmers freeing NULL caused so many issues they changed the API. It doesn't help that `malloc(0)` returns NULL on some platforms.
If you are writing code for an embedded platform with some random C compiler, all bets on what `free(NULL)` does are off. That means a cautious C programmer who doesn't know who will be using their code never allows NULL to be passed to `free()`.
In general, most good C programmers are good because they suffer a sort of PTSD from the injuries the language has inflicted on them in the past. If they aren't avoiding passing NULL to `free()`, they haven't suffered long enough to be good.
If your compiler chokes on `free(NULL)` you have bigger problems that no LLM (or human) can solve for you: you are using a compiler that was last maintained in the 80s!
If your C compiler doesn't adhere to the very first C standard published, the problem is not the quality of the code that is written.
> If they aren't avoiding passing NULL to `free()`, they haven't suffered long enough to be good.
I dunno; I've "suffered" since the mid-90s, and I will free NULL, because it is legal in the standard, and because I have not come across a compiler that does the wrong thing on `free(NULL)`.
Oh yes, you probably will see errors elsewhere. If you are lucky it will happen immediately. But often enough millions of executed instructions later, in some unrelated routine that had its memory smashed. It's not "fun" figuring out what happened. It could be nothing - bit flips are a thing, and once you get the error rate low enough the frequency of bit flips and bugs starts to converge. You could waste days of your time chasing an alpha particle.
I saw the author of curl post some of this code here a while back. I immediately recognised the symptoms. Things like:
Every 2nd line was code like that. If you are wondering, he wrote `(NULL == foo)` in case he dropped an `=`, so it became `(NULL = foo)`. The second version is a syntax error, whereas `(foo = NULL)` is a runtime disaster. Most of it was unjustified, but he could not help himself. After years of dealing with C, he wrote code defensively - even if it wasn't needed. C is so fast and the compilers so good the coding style imposes little overhead.Rust is popular because it gives you a similar result to C, but you don't need to have been beaten by 10 years of pain in order to produce safe Rust code. Sadly, it has other issues. Despite them, it's still the best C we have right now.
I always found myself writing verbose copypasta code first, then compress it down based on the emerging commonalities. I think doing it the other way around is likely to lead to a worse design. Can you not tell the LLM to do the same? Honest question.
I do pretty much the same thing, which is to say I "write code using a brain dump", "look for commonalities that tickle the neurons", then "refactor". Lather, rinse, and repeat until I'm happy.
> Can you not tell the LLM to do the same?
You can tell them until you're blue in the face. They ignore you.
I'm sure this is a temporary phase. Once they solve the problem, coding will suffer the same fate as blacksmiths making nails. [0] To solve it they need to satisfy two conflicting goals - DRY the code out, while keeping interconnections between modules to a minimum. That isn't easy. In fact it's so hard people who do it well and can do it across scales are called senior software engineers. Once models master that trick, they won't be needed any more.
By "they" I mean "me".
[0] Blacksmiths could produce 1,000 or so a day, but it must have been a mind-numbing day even if it paid the bills. Then automation came along, and produced them at over a nail per second.
I found it exceptionally good, because:
a) The agent doesn't need to read the implementation of anything - you can stuff the entire projects headers into the context and the LLM can have a better birds-eye view of what is there and what is not, and what goes where, etc.
and
b) Enforcing Parse, don't Validate using opaque types - the LLM writing a function that uses a user-defined composite datatype has no knowledge of the implementation, because it read only headers.
Write code? No. Use frontier models. They are subsidized and amazing and they get noticably better ever few months.
Literally anything else? Smaller models are fine. Classifiers, sentiment analysis, editing blog posts, tool calling, whatever. They go can through documents and extract information, summarize, etc. When making a voice chat system awhile back I used a cheap open weight model and just asked it "is the user done speaking yet" by passing transcripts of what had been spoken so far, and this was 2 years ago and a crappy cheap low weight model. Be creative.
I wouldn't trust them to do math, but you can tool call out to a calculator for that.
They are perfectly fine at holding conversations. Their weights aren't large enough to have every book ever written contained in them, or the details of every movie ever made, but unless you need that depth and breadth of knowledge, you'll be fine.
Open weight models have those same issues. They are otherwise fine.
You can hook them up to a vector DB and build a RAG system. They can answer simple questions and converse back and forth. They have thinking modes that solve more complex problems.
They aren't going to discover new math theorems but they'll control a smart home and manage your calendar.
I know it may sound ridiculous, but it could actually become a way to break away from the business models that have been developed over the past few decades. Broadly speaking, this even amounts to saying that the biggest victims of AI could be the companies that bet on AI as a service.
Yet I know my vision is way too idealistic but I'm coming to imagine that a human brain, although less efficient in the long run, remains a reliable way to control the resulting costs and could even turn out to be more advantageous and more readily available than its silicon-based counterpart.
1. https://pmc.ncbi.nlm.nih.gov/articles/PMC8364152/
Also - turbine blades limit power, according to Elon.
Between them - we cannot chip fabs past a certain rate, and we cannot stand up the datacenter to run these desired chips past a certain rate. Different people believe one or the other is the 'true' current bottleneck. The turbine supply chain scaling looks much more tractable -- EUV is essentially the most complicated production process humans have ever devised.
- clean room, itself needing the infrastructure for it (size, airCo, filtering, electricity) and the staff to run and maintain that basically empty space - wafers to "print" on, so that's a lot of water and logistic to manipulate them (so infrastructure for clean water and all chemicals) also with dedicated staff - finally staff who would be able to design something significantly better than NVIDIA, Intel, Broadcom, IBM, etc while (and arguably that's the trickiest part IMHO) being able to get it good enough as at a scale that can be manufactured from their own fab.
so I'm wondering who can afford this kind of setup that can only then make use of ASML machines.
Fabs are some of the most complex chemical engineering sites (dealing with some of the most dangerous substances) in the world. So don't underestimate the complexity of this part.
If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."
But labs arent buying oranges — theyre buying the only orchard on the island, hoping it yields a fruit no ones grown yet. Burning $1B to net $500M isnt "I have too few oranges." Its "Im betting the farm Ill find a new one."
Both can be irrational. Theyre irrational in different ways.
'If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."'
How about 'if I'm a grocery store and I see no limit on demand for oranges at $.50 but they are currently $1, I can say 'if oranges were cheaper I could sell orders of magnitude more of them'.
Buying oranges for $1 and selling for $0.5 is an investment into acquiring market share and customer relationships and a gamble on the price of oranges falling in the future.
The whole setup rests on this, and it seems mythical to me. These guys have basically equivalent products at this point.
It's a delusion that customers are going to remain with the behemoths when a Qwen model run by an independent is $10/m, unlimited usage.
This is not a market that can be locked-in with network effects, and the current highly-invested players have no moat.
"I built a ship to go to the Indies and bring back tea."
"Bro, the ship cost 100,000 pounds sterling and only brought back 50,000 pounds of tea. I don't care if you paid 12,500 pounds for the tea itself, you're losing money."
There is a very rational reason labs are spending everything they can get for more compute right now. The tea (inference) pays 60%+ margins. And that is rising. And that number is AFTER hyper scalars make their margins. There is an immense amount of profit floating around this system, and strategics at the edge believing they can build and control the demand through combined spend on training and inference in the proper ratios.
Could they be accurate? Sure, I think people who claim this is impossible are overconfident. But I would encourage anyone who assumes they must be right to read a history of the Worldcom scandal. It's really quite easy for a person who wants to be making money (or an LLM who's been instructed to "run the accounts make no mistakes"!) to incorrectly categorize costs as capital investments when nobody's watching carefully.
It remains to be seen what new wave of AI system or systems will replace it, making the whole current architecture obsolete.
Meanwhile, they are milking it, in the name of scarcity.
One person replies "yes". Another replies "no".
This concludes our press conference.
<3 HN
There is a 2-3years still before ASIC LLM inferences will catch up.
It won't make sense for ASIC LLMs to manifest until things start to plateau, otherwise it'll be cheaper to get smarter tokens on the cloud for almost all use cases.
That said, a 10 trillion parameter model on a bespoke compute platform overcomes a lot of efficiency and FOOM aspects of the market fit, so the angle is "when will models that can be run on an asic be good enough that people will still want them for various things even if the frontier models are 10x smarter and more efficient"
I think we're probably a decade of iteration on LLMs out, at least, and the entire market could pivot if the right breakthrough happens - some GPT-2 moment demonstrating some novel architecture that convinces the industry to make the move could happen any time now.
Its like being back in 1850 and you build the world's first amusement park where the rides are free or very cheap. People are like Amusement parks are the next big thing since Steam Boats! And tons of other rich people start to build huge amusement parks everywhere. The people who are skilled at making amusement park rides will increase their prices, and since the first amusement parks are free so they can get the public going to them demand will be huge.
But how sustainable is that? - well obviously we know from history that amusement parks did, in fact, take over the world and most people spent virtually all their time and money at amusement parks - I think the Crimean War was even fought over some religious-based theme park in Israel - until moving pictures came out, so it worked out for them, but for AI?
1. Supply can scale. You can point to COVID/supply-chain shocks, but the problem there is temporary changes. No one spins up a whole fab to address a 3 month spike. Whereas AI is not a temporary demand change.
2. Models are getting more efficient. DeepSeek V3 was 1/10th the cost of contemporary ChatGPT. Open weight models get more runnable or smarter every month. Cutting edge is always cutting edge, but if scarcity is real, model selection will adjust to fit it.