Plan for cheaper LLM inference: OpenAI and Broadcom unveil Jalapeno
OpenAI and Broadcom have unveiled Jalapeno, OpenAI's first custom inference chip for LLM workloads. The useful signal for builders is not instant access, but OpenAI's move toward owning more of the stack that controls latency, availability, and inference cost.
OpenAI and Broadcom have confirmed Jalapeno, OpenAI's first custom inference processor for large language model workloads. Confidence level: confirmed for the announcement, early for real-world performance, because OpenAI says final measurements and a technical report are still pending. Builders cannot use the chip directly today, but the move matters for anyone watching API latency, Codex throughput, model availability, and long-term inference pricing.

What changed
OpenAI and Broadcom unveiled Jalapeno on June 24, 2026. OpenAI describes it as an "Intelligence Processor" built around LLM inference rather than a general-purpose accelerator adapted from older machine-learning workloads.
The company says engineering samples are already running machine-learning workloads in the lab at production target frequency and power, including GPT-5.3-Codex-Spark. OpenAI also says early testing points to substantially better performance per watt than current state-of-the-art options, but it has not published final benchmarks.
Jalapeno is the first chip in a multi-generation compute platform. OpenAI says Broadcom contributes silicon implementation and networking, while Celestica helps with board, rack, and system integration. Deployment is planned to begin by the end of 2026 and expand over later generations.
Key takeaways
- Jalapeno is aimed at inference: the work of running models for ChatGPT, Codex, API calls, and future interactive products.
- OpenAI says the chip was developed from design to production tape-out in nine months, with OpenAI models helping parts of the design process.
- The first practical impact is likely capacity and efficiency, not a new model that users can select today.
- Published performance claims are still early. OpenAI has not released detailed benchmarks, pricing changes, or public availability terms.
- Independent coverage from TechCrunch and The Verge frames the chip as part of OpenAI's push to reduce dependence on Nvidia GPUs and control more of its AI stack.
| Signal | What it means | What is still unknown |
|---|---|---|
| Custom inference chip | OpenAI wants hardware tuned for ChatGPT, Codex, API, and agent workloads | Whether developers see lower prices or higher rate limits |
| Broadcom partnership | OpenAI gets custom silicon and networking support from an experienced ASIC partner | How much supply OpenAI can secure in 2026 and 2027 |
| Lab samples running workloads | The project is beyond a paper design | Production yield, final performance, and reliability at scale |
| Multi-generation roadmap | This is an infrastructure strategy, not a one-off chip demo | Whether future chips also target training workloads |
Availability and access
There is no direct developer access to Jalapeno. OpenAI presents the chip as internal infrastructure for making advanced AI faster, more reliable, and more affordable over time.
OpenAI says initial deployment is planned by the end of 2026 with data center partners, including Microsoft and other partners. It has not announced an API SKU, customer opt-in, region list, committed rate-limit increase, or model-specific pricing change tied to Jalapeno.
Practical LinkLoot angle
For builders, Jalapeno is a planning signal. If your product depends on high-volume LLM calls, long-running Codex tasks, or agent loops with many tool calls, inference economics matter as much as model quality.
Do not assume instant cheaper tokens. Watch for concrete changes: new OpenAI API pricing, higher rate limits, more stable latency during demand spikes, Codex task throughput, and any public technical report comparing Jalapeno against current accelerators.
This also strengthens the case for designing AI workflows that can route work by cost and latency. Keep expensive reasoning steps narrow, cache repeatable outputs, and split high-volume tasks from premium reasoning tasks. LinkLoot's /guides/ai-workflow-automation hub is the right place to connect that infrastructure shift to practical workflow design.
What to verify before you act
- Check whether OpenAI publishes the promised technical report with final performance-per-watt numbers.
- Watch the OpenAI API pricing page and changelog for model-specific cost or rate-limit changes.
- Confirm whether Codex, ChatGPT, or API workloads actually move to Jalapeno-backed infrastructure.
- Track whether deployment begins by the end of 2026 as stated, and whether it is region-limited.
- Separate inference impact from training impact; OpenAI has described Jalapeno as an inference processor, not a training chip.
Source check
Confirmed by OpenAI: Jalapeno is OpenAI's first LLM-optimized inference processor, co-developed with Broadcom, with Celestica involved in system integration. OpenAI says lab samples are running workloads, a technical report is coming, and initial deployment is planned by the end of 2026.
Independent context: TechCrunch corroborates that Jalapeno is OpenAI's first custom-built inference processor and ties it to OpenAI's effort to reduce dependence on Nvidia GPUs. The Verge independently frames the chip as OpenAI's first AI processor and part of a broader full-stack infrastructure push.
Early or unverified: final benchmarks, direct user-facing pricing impact, rate-limit changes, production yield, and whether the chip becomes available outside OpenAI-operated workloads.
No. OpenAI has announced Jalapeno as internal inference infrastructure, not a public API option or developer-selectable chip.
