ออฟไลน์ด้วยแอป Player FM !
LW - Unit economics of LLM APIs by dschwarz
ซีรีส์ที่ถูกเก็บถาวร ("ฟีดที่ไม่ได้ใช้งาน" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 22, 2024 16:12 ()
Why? ฟีดที่ไม่ได้ใช้งาน status. เซิร์ฟเวอร์ของเราไม่สามารถดึงฟีดพอดคาสท์ที่ใช้งานได้สักระยะหนึ่ง
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 436680723 series 3337129
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Unit economics of LLM APIs, published by dschwarz on August 28, 2024 on LessWrong.
Disclaimer 1: Our calculations are rough in places; information is sparse, guesstimates abound.
Disclaimer 2: This post draws from public info on FutureSearch as well as a paywalled report. If you want the paywalled numbers, email
dan@futuresearch.ai with your LW account name and we'll send you the report for free.
Here's our view of the unit economics of OpenAI's API. Note: this considers GPT-4-class models only, not audio or image APIs, and only direct API traffic, not usage in ChatGPT products.
As of June 2024, OpenAI's API was very likely profitable, with surprisingly high margins. Our median estimate for gross margin (not including model training costs or employee salaries) was 75%.
Once all traffic switches over to the new August GPT-4o model and pricing, OpenAI plausibly still will have a healthy profit margin. Our median estimate for the profit margin is 55%.
The Information implied that OpenAI rents ~60k A100-equivalents from Microsoft for non-ChatGPT inference. If this is true, OpenAI is massively overprovisioned for the API, even when we account for the need to rent many extra GPUs to account for traffic spikes and future growth (arguably creating something of a mystery).
We provide an explicit, simplified first-principles calculation of inference costs for the original GPT-4, and find significantly lower throughput & higher costs than Benjamin Todd's result (which drew from Semianalysis).
Summary chart:
What does this imply? With any numbers, we see two major scenarios:
Scenario one: competition intensifies. With llama, Gemini, and Claude all comparable and cheap, OpenAI will be forced to again drop their prices in half. (With their margins FutureSearch calculates, they can do this without running at a loss.) LLM APIs become like cloud computing: huge revenue, but not very profitable.
Scenario two: one LLM pulls away in quality. GPT-5 and Claude-3.5-opus might come out soon at huge quality improvements. If only one LLM is good enough for important workflows (like agents), it may be able to sustain a high price and huge margins. Profits will flow to this one winner.
Our numbers update us, in either scenario, towards:
An increased likelihood of more significant price drops for GPT-4-class models.
A (weak) update that frontier labs are facing less pressure today to race to more capable models.
If you thought that GPT-4o (and Claude, Gemini, and hosted versions of llama-405b) were already running at cost in the API, or even at a loss, you would predict that the providers are strongly motivated to release new models to find profit. If our numbers are approximately correct, these businesses may instead feel there is plenty of margin left, and profit to be had, even if GPT-5 and Claude-3.5-opus etc. do not come out for many months.
More info at https://futuresearch.ai/openai-api-profit.
Feedback welcome and appreciated - we'll update our estimates accordingly.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
1851 ตอน
ซีรีส์ที่ถูกเก็บถาวร ("ฟีดที่ไม่ได้ใช้งาน" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 22, 2024 16:12 ()
Why? ฟีดที่ไม่ได้ใช้งาน status. เซิร์ฟเวอร์ของเราไม่สามารถดึงฟีดพอดคาสท์ที่ใช้งานได้สักระยะหนึ่ง
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 436680723 series 3337129
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Unit economics of LLM APIs, published by dschwarz on August 28, 2024 on LessWrong.
Disclaimer 1: Our calculations are rough in places; information is sparse, guesstimates abound.
Disclaimer 2: This post draws from public info on FutureSearch as well as a paywalled report. If you want the paywalled numbers, email
dan@futuresearch.ai with your LW account name and we'll send you the report for free.
Here's our view of the unit economics of OpenAI's API. Note: this considers GPT-4-class models only, not audio or image APIs, and only direct API traffic, not usage in ChatGPT products.
As of June 2024, OpenAI's API was very likely profitable, with surprisingly high margins. Our median estimate for gross margin (not including model training costs or employee salaries) was 75%.
Once all traffic switches over to the new August GPT-4o model and pricing, OpenAI plausibly still will have a healthy profit margin. Our median estimate for the profit margin is 55%.
The Information implied that OpenAI rents ~60k A100-equivalents from Microsoft for non-ChatGPT inference. If this is true, OpenAI is massively overprovisioned for the API, even when we account for the need to rent many extra GPUs to account for traffic spikes and future growth (arguably creating something of a mystery).
We provide an explicit, simplified first-principles calculation of inference costs for the original GPT-4, and find significantly lower throughput & higher costs than Benjamin Todd's result (which drew from Semianalysis).
Summary chart:
What does this imply? With any numbers, we see two major scenarios:
Scenario one: competition intensifies. With llama, Gemini, and Claude all comparable and cheap, OpenAI will be forced to again drop their prices in half. (With their margins FutureSearch calculates, they can do this without running at a loss.) LLM APIs become like cloud computing: huge revenue, but not very profitable.
Scenario two: one LLM pulls away in quality. GPT-5 and Claude-3.5-opus might come out soon at huge quality improvements. If only one LLM is good enough for important workflows (like agents), it may be able to sustain a high price and huge margins. Profits will flow to this one winner.
Our numbers update us, in either scenario, towards:
An increased likelihood of more significant price drops for GPT-4-class models.
A (weak) update that frontier labs are facing less pressure today to race to more capable models.
If you thought that GPT-4o (and Claude, Gemini, and hosted versions of llama-405b) were already running at cost in the API, or even at a loss, you would predict that the providers are strongly motivated to release new models to find profit. If our numbers are approximately correct, these businesses may instead feel there is plenty of margin left, and profit to be had, even if GPT-5 and Claude-3.5-opus etc. do not come out for many months.
More info at https://futuresearch.ai/openai-api-profit.
Feedback welcome and appreciated - we'll update our estimates accordingly.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
1851 ตอน
ทุกตอน
×ขอต้อนรับสู่ Player FM!
Player FM กำลังหาเว็บ