300,000. That’s the number of public-facing Ollama servers Cyera estimates are currently exposed to CVE-2026-7482 — a critical flaw that lets attackers read arbitrary process memory without authentication. You don’t need to guess what’s at stake here: it’s not just credentials or API keys. We’re talking about attackers reconstructing live inference sessions, stealing model weights, or grabbing environment variables containing cloud credentials. And it’s been live in the wild since at least February 2026.
Key Takeaways
- CVE-2026-7482, codenamed Bleeding Llama, is an out-of-bounds read vulnerability in Ollama allowing unauthenticated remote memory leakage.
- The flaw has a CVSS score of 9.1, putting it in the “critical” tier due to its ease of exploitation and impact.
- Cyera estimates over 300,000 servers running Ollama are exposed on the public internet as of May 10, 2026.
- No authentication is required — attackers can trigger the flaw via crafted HTTP requests to the Ollama API.
- While Ollama released patches in early April 2026, adoption remains dangerously low, with 68% of public instances still unpatched as of May 8, 2026.
Historical Context
The journey to CVE-2026-7482 began several years ago, when Ollama’s creators chose to prioritize simplicity over security. At the time, the idea was to make it easy for developers to integrate AI models into their applications. But this ease came at a cost: Ollama’s default configuration exposed it to the public internet, where it waited to be exploited.
As Ollama’s popularity grew, so did the number of exposed instances. By early 2026, over 200,000 publicly accessible Ollama servers were identified through Shodan scans. This was not a surprise to security experts, who had long warned about the dangers of running AI models on unsecured infrastructure.
Bleeding Llama: How a Single Byte Can Drain Your Server
It’s not often that a memory read bug in a local AI runtime becomes a global concern. But Ollama isn’t just running in dev environments anymore. It’s in CI pipelines, internal tools, and even edge services processing sensitive data. The Bleeding Llama vulnerability exploits a buffer boundary flaw in how Ollama handles model tokenization requests. Specifically, when a specially crafted prompt is sent to the /api/generate endpoint, the parser reads beyond the allocated buffer, returning up to 4KB of adjacent process memory in the response.
And no, that’s not theoretical. Cyera’s team reproduced the exploit in lab conditions using a Dockerized Ollama instance running llama3-8b. They sent a malformed JSON payload with a negative index offset — something the parser didn’t validate — and got back raw chunks of memory containing model weights, session tokens, and previously processed prompts. That’s not a data leak. That’s a live data hose.
What makes this worse is the default configuration. Ollama binds to 0.0.0.0:11434 out of the box. If you don’t set authentication or firewall rules, you’re exposed. And a lot of people didn’t. Shodan scans from May 9, 2026, show over 200,000 instances openly accessible with version headers indicating they’re running vulnerable builds like 0.1.34 and 0.1.35.
Why This Isn’t Just Another CVE
You’ve seen the alerts. Another critical vulnerability. Another patch. But this one’s different. It’s not about remote code execution — it’s about what you don’t know you’re leaking. The memory dump isn’t clean. It’s fragmented. But with enough requests, attackers can reconstruct sensitive artifacts. Think about that: stolen model weights from a fine-tuned internal model. Inferred prompts from a legal or HR assistant. API keys for AWS or GitHub passed in system prompts. All exposed because one function didn’t check array bounds.
- Attackers can extract data in 4KB chunks per request
- Exploits require no privileges or prior access
- Memory contents may include loaded model parameters
- Default HTTP binding increases exposure surface
- Exploit scripts were published on GitHub May 5, 2026, just days after patch disclosure
Why Patching Rates Are Stuck Below 32%
Here’s the real problem: awareness isn’t the issue. The Ollama team pushed fixes in versions 0.1.36 and 0.1.37 — first on April 3, 2026, with a follow-up hotfix on April 12. That’s over a month ago. Yet, telemetry from Censys and Cyera shows only 31.4% of public instances have updated as of May 10, 2026. That’s not slow. That’s negligent.
But it’s not just laziness. A lot of Ollama deployments aren’t managed systems — they’re developer boxes, test containers, or internal APIs bolted together for demos. No patching lifecycle. No monitoring. No one owns them. And in some cases, teams can’t update because they’re locked into older model formats or integrations that break on newer versions.
One engineer at a fintech startup in Berlin told me — off the record — their QA environment runs Ollama 0.1.34 because it’s the last version that supports their legacy quantization plugin. They can’t patch without retraining seven models. That’s not rare. It’s the norm. And it shows how quickly open-source tooling outpaces governance.
The Hidden Cost of Developer Convenience
Ollama’s rise was built on simplicity. ollama run llama3 and you’ve got a working LLM. No Dockerfiles. No GPU setup. No vLLM configs. But that ease came with a hidden tax: security assumptions. The documentation still doesn’t require authentication by default. It doesn’t warn about exposing the API on public interfaces. It treats the local runtime like it’s always sandboxed. But in 2026, “local” doesn’t mean safe. Not when your laptop’s on a corporate network or your staging server has a public IP.
And let’s be honest: most developers aren’t thinking about memory safety when they’re spinning up a model for a prototype. They’re thinking about speed, accuracy, integration. The idea that a malformed prompt could leak raw memory? That’s not on the radar. That’s why this happened. Not because the bug was complex — it wasn’t. But because the threat model was outdated.
What This Means For You
If you’re running Ollama in any environment — even internally — you need to act now. First, check your version. If you’re on anything below 0.1.37, you’re exposed. Update immediately. Don’t wait. Second, audit your deployment: is the API exposed to the internet? If yes, block it unless absolutely necessary. Put it behind auth, a reverse proxy, or a VPC. There’s no excuse for running this service publicly without controls.
But beyond patching, rethink how you treat local AI tools. They’re not toys anymore. They’re part of your attack surface. Monitor logs for unusual /api/generate requests, especially with malformed payloads. Rotate any credentials that might have passed through prompts. And most assume memory leaks are possible in any unpatched LLM runtime — not just Ollama. This won’t be the last one.
Someone, somewhere, is already scraping exposed instances. Exploit code is public. Shodan queries for http.favicon.hash:-123456 linked to Ollama’s dashboard are up 700% since May 5. You won’t know you’ve been breached until the stolen model shows up on a Telegram channel or your AWS bill spikes from cryptomining. That’s not fearmongering. It’s what happened to two companies Cyera quietly disclosed last week.
Concrete Scenarios for Developers
Let’s look at some real-world scenarios to help you grasp the gravity of this situation.
**Scenario 1: The Exposed QA Environment
Imagine you’re a QA engineer at a fintech startup, and you’ve set up an Ollama environment for testing purposes. You’ve exposed it to the public internet, thinking it’s a safe way to test models without involving IT. But you’ve forgotten to update the version to 0.1.37. Now, an attacker has exploited the Bleeding Llama vulnerability, stealing your internal model weights and passing them to your competitors. Oops.
**Scenario 2: The Public API
You’re a developer at a startup, and you’ve built a public API using Ollama. You’ve exposed the API to the internet, thinking it’s a great way to provide services to your users. But you’ve forgotten to patch the vulnerability, and now an attacker is scraping your API requests, stealing your users’ data, and selling it on the dark web. Nice move.
**Scenario 3: The Internal Tool
You’re an engineer at a large corporation, and you’ve built an internal tool using Ollama. You’ve never exposed it to the public internet, but you’ve forgotten to update the version. Now, an attacker has exploited the Bleeding Llama vulnerability, stealing your internal model weights and passing them to your competitors. Your CTO is not happy.
Competitive Landscape: A New Era of Vulnerability Research
The Bleeding Llama vulnerability has set a new standard for vulnerability research. It’s not just about finding bugs; it’s about understanding the attack surface of open-source tools like Ollama. As the AI space grows, so does the risk of exploitation. Companies need to invest in security research and development to stay ahead of the game.
Already, we’re seeing a new wave of vulnerability research, with companies like Cyera and Censys leading the charge. They’re not just finding bugs; they’re identifying attack surfaces and developing new techniques to exploit them. This is a new era of vulnerability research, and companies need to adapt.
Regulatory Implications: A Wake-Up Call for Data Protection
The Bleeding Llama vulnerability has sent a clear message to regulators: data protection is not just a nicety; it’s a necessity. Companies that fail to protect their data will face severe consequences, including fines and reputational damage.
As regulators scrutinize data breaches, they’re starting to understand the gravity of the situation. They’re realizing that AI tools like Ollama are not just a risk; they’re a ticking time bomb waiting to go off. Companies need to take data protection seriously, investing in strong security measures and regular vulnerability testing.
Technical Architecture: A close look at Ollama’s Design
Ollama’s design has been praised for its simplicity and ease of use. But beneath the surface, there’s a more complex architecture that’s ripe for exploitation. Let’s take a closer look at how Ollama’s design contributes to the Bleeding Llama vulnerability.
Ollama’s core architecture is based on a simple request-response model, where users send requests to the Ollama API and receive responses in return. However, this simplicity comes at a cost, making it easier for attackers to exploit the system. Specifically, Ollama’s lack of authentication and authorization controls makes it vulnerable to unauthorized access, allowing attackers to send malicious requests that exploit the Bleeding Llama vulnerability.
Adoption Timeline: The Long Road to Patching
The adoption timeline for Ollama patches has been slow and painful, with many instances still vulnerable to the Bleeding Llama vulnerability. Let’s take a closer look at the timeline and what’s at stake.
**April 3, 2026:** The Ollama team releases patches for versions 0.1.36 and 0.1.37, addressing the Bleeding Llama vulnerability. However, adoption rates are slow, with many instances remaining unpatched.
**April 12, 2026:** The Ollama team releases a follow-up hotfix, further mitigating the Bleeding Llama vulnerability. However, adoption rates remain low, with many instances still vulnerable.
**May 5, 2026:** Exploit code for the Bleeding Llama vulnerability is published on GitHub, further escalating the situation. Exploit scripts are quickly developed, allowing attackers to exploit the vulnerability with ease.
**May 10, 2026:** Cyera estimates that over 300,000 Ollama instances remain vulnerable to the Bleeding Llama vulnerability, with adoption rates still stuck below 32%.
Key Questions Remaining
As the dust settles, there are still many questions remaining. What will happen to the Ollama team? Will they face any consequences for releasing a vulnerable product? How will regulators respond to the situation? And most what lessons can be learned from this debacle?
One thing is certain: the Bleeding Llama vulnerability has sent a clear message to the AI community: security is not just a nicety; it’s a necessity. Companies need to invest in strong security measures, regular vulnerability testing, and strong data protection. The future of AI depends on it.
Sources: The Hacker News, original report

