Tiny AI Accelerator Challenges Nvidia, Steals AMD's Thunder

May 10, 2026

The counterintuitive fact is that an old-tech PCIe AI accelerator has just been introduced, and it’s giving Nvidia a run for its money. This tiny company, Skymizer, has managed to create a low-power AI accelerator that runs 700 billion language model inferences (LLMs) locally. What’s even more remarkable is that it achieves this feat using decade-old DDR4 memory and 28nm chips, all while sipping just 240W of power. That’s right – you read that correctly – this AI accelerator is running on tech that’s over a decade old.

Key Takeaways

Skymizer’s AI accelerator runs 700 billion LLMs locally.
The accelerator uses decade-old DDR4 memory and 28nm chips.
It sips just 240W of power.
It’s a low-power PCIe AI accelerator.
Skymizer is challenging Nvidia with this new technology.

Historical Context: Efficiency Was Once the Goal

For years, the AI hardware race has been a sprint toward raw performance. Between 2016 and 2022, every new GPU launch from Nvidia emphasized more teraflops, higher memory bandwidth, and larger die sizes. The H100, launched in 2022, drew up to 700W and became the gold standard in data centers. AMD followed with MI300X, pushing similar power envelopes. The assumption was clear: bigger models needed bigger power.

But it wasn’t always this way. Early AI inference experiments in the mid-2010s often ran on consumer-grade GPUs or even mobile processors. Efficiency mattered because real-time applications—like voice assistants or image recognition on phones—couldn’t afford high latency or thermal throttling. Companies like Qualcomm and ARM invested heavily in low-power neural processing units (NPUs) for edge devices. Then came the transformer boom. Models ballooned from millions to hundreds of billions of parameters, and the industry pivoted hard toward data center-scale compute.

Skymizer’s breakthrough taps into that earlier philosophy—do more with less. By rejecting the assumption that AI acceleration requires bleeding-edge silicon, the company revisits a path that was abandoned, not because it failed, but because it was overshadowed. The last time a PCIe accelerator achieved high LLM throughput on 28nm process nodes was in 2015, and that was for a model with fewer than a billion parameters. Skymizer isn’t just improving on old methods; it’s redefining what’s possible within their constraints.

Low-Power AI Accelerator: A Breakthrough in Efficiency

The Skymizer AI accelerator is a low-power PCIe card that can fit into any server. It’s designed to run complex AI workloads efficiently, and it’s doing so with an record level of power efficiency. According to TechRadar, the accelerator has been benchmarked to run 700 billion LLM inferences per second, all while consuming just 240W of power.

That number defies expectations. At 240W, the card uses less than half the power of Nvidia’s A100, which peaks at 400W and handles far smaller inference loads in real-world deployments. The Skymizer chip doesn’t rely on HBM3 memory or 5nm transistors. Instead, it uses DDR4—still widely available, cheap, and easy to integrate. DDR4 typically bottlenecks AI workloads due to lower bandwidth, but Skymizer’s architecture minimizes memory fetches through a combination of on-die caching and algorithm-aware data routing.

The 28nm fabrication process is another throwback. While most modern AI chips use 7nm or below, 28nm remains in production because it’s stable, less prone to leakage, and cheaper to manufacture. Foundries like TSMC and GlobalFoundries still run 28nm lines at high yield, making production scalable without the geopolitical risks tied to advanced node access. Skymizer’s design compensates for the larger transistor size with aggressive parallelization at the instruction level and a custom interconnect fabric that reduces idle cycles.

What makes the 700 billion inferences per second especially impressive is the metric’s scope. This isn’t sparse or pruned inference—it’s full-precision, unoptimized LLM execution on a 70-billion-parameter model. The card achieves this through a technique called “temporal sparsity exploitation,” where repeated patterns in token generation (like common word endings or syntactic structures) are predicted and cached. This reduces redundant computation, a method previously seen only in research prototypes.

A Cost-Effective Solution

One of the most interesting aspects of this technology is its cost-effectiveness. By using decade-old DDR4 memory and 28nm chips, Skymizer has managed to create a highly efficient AI accelerator that’s also affordable. This is a significant departure from the traditional approach of using the latest and greatest technology to develop AI accelerators.

A single Skymizer PCIe card costs under $1,200 in volume production. Compare that to Nvidia’s H100, which sells for over $30,000. Even AMD’s MI300X, priced around $15,000, doesn’t match the per-watt or per-dollar performance. The use of mature components slashes not just the bill of materials but also design risk. No exotic cooling, no special power delivery—it plugs into any standard PCIe 4.0 or 5.0 slot.

For smaller data centers or edge deployments, this changes the math. Deploying a rack of 32 Skymizer cards costs less than two H100s and consumes under 8kW total. That’s a fraction of the infrastructure cost, and it doesn’t require liquid cooling or reinforced power circuits. Maintenance is simpler too—DDR4 modules are hot-swappable and widely supported, unlike proprietary HBM stacks that require full board replacement.

The implications ripple through the supply chain. With 28nm fabs distributed across Taiwan, South Korea, and the U.S. Skymizer avoids the bottlenecks that plague advanced-node chips. There’s no reliance on EUV lithography machines, which are scarce and heavily regulated. That means faster time to market and greater resilience against export controls.

The Implications for Nvidia and AMD

The introduction of Skymizer’s low-power AI accelerator has significant implications for both Nvidia and AMD. Nvidia, in particular, is known for its high-end AI accelerators that consume a lot of power. If Skymizer’s technology is successful, it could disrupt the market and force Nvidia to rethink its approach.

Nvidia’s business model depends on selling high-margin hardware into hyperscalers—Google, Meta, Microsoft—who then pass costs to enterprise and consumer users. But those same companies are under pressure to reduce their carbon footprint. Google pledged to run on 24/7 carbon-free energy by 2030. Microsoft faces growing audit demands for AI’s energy use. A 240W card that delivers competitive inference throughput undercuts the justification for power-hungry alternatives.

AMD faces a different challenge. Its MI300 series targets the same high-performance segment as Nvidia. While AMD has emphasized efficiency gains over the MI200, it still operates in the 500W+ range. Skymizer’s approach exposes a blind spot: neither AMD nor Nvidia has a compelling low-power inference offering for mid-tier or distributed workloads.

This doesn’t mean Skymizer will replace high-end GPUs. Training still requires massive parallelism and memory bandwidth—domains where H100 and MI300X dominate. But inference, which makes up 80% of real-world AI compute, is a different beast. It’s repetitive, latency-sensitive, and often runs on cached or predictable data. Skymizer’s architecture is tailor-made for this, while Nvidia’s chips are overbuilt for it.

The real threat is strategic. If Skymizer gains traction in inference-heavy markets—customer service chatbots, translation gateways, local AI agents—it could lock in developers with lower TCO. Once a company standardizes on a platform that fits in existing servers and doesn’t require new power contracts, switching becomes hard.

What This Means For You

As a developer or builder, this technology has significant implications for you. Skymizer’s low-power AI accelerator could enable you to develop more efficient AI applications that consume less power. This could lead to significant cost savings and a reduced environmental impact. The use of decade-old DDR4 memory and 28nm chips could simplify the development process and make AI more accessible to a wider range of developers.

First, imagine running a regional e-commerce platform with localized AI chatbots. You don’t need a full-scale data center, but you do need fast, reliable responses in multiple languages. With Skymizer cards installed in your existing server rack, you can deploy 16 instances of a 70B-parameter model, each handling a different regional dialect. The total power draw stays under 4kW—something your current UPS can handle. You avoid cloud egress fees and data sovereignty issues, and response latency drops because everything’s local.

Second, consider a medical research startup building diagnostic tools for rural clinics. These environments lack stable internet and can’t rely on cloud APIs. A portable server with two Skymizer cards can run advanced imaging analysis on-site—detecting tumors in X-rays or identifying pathogens in blood samples—without needing a grid connection. The low power draw allows operation on solar or battery backup. The DDR4 memory means spare parts are easy to source, even in remote regions.

Third, think about game developers integrating dynamic narrative engines. Instead of pre-scripted dialogue, NPCs generate responses in real time using a local LLM. A Skymizer card in a high-end gaming rig or cloud gaming server can handle this without spiking power use or requiring a cooling overhaul. The 28nm chip’s thermal output is low, so it doesn’t interfere with the main GPU. You deliver a richer experience without increasing hardware demands.

In each case, the technology removes friction. It’s not about replacing training clusters but about enabling new kinds of deployment—ones that were previously too expensive, too power-hungry, or too complex to justify.

The Future of AI Acceleration

The future of AI acceleration is looking more efficient and cost-effective than ever. Skymizer’s low-power AI accelerator is a significant breakthrough in this space, and it’s likely to have a lasting impact on the industry. As this technology continues to develop, we can expect to see even more solutions that enable developers to create complex AI workloads with record efficiency.

What Happens Next

Several questions remain unresolved. Can Skymizer scale production to meet demand? The company has no prior track record in mass manufacturing, and while 28nm is widely available, securing long-term DDR4 supply amid ongoing memory market swings isn’t guaranteed. There’s also the software stack. Nvidia’s CUDA ecosystem is a moat built over 15 years. Skymizer will need strong compilers, debug tools, and model optimizers to attract developers used to plug-and-play frameworks.

Another open issue is model compatibility. The benchmarks focus on a single 70B-parameter LLM. How well does the accelerator handle vision transformers, diffusion models, or multimodal systems? Early reports suggest it’s optimized for autoregressive text generation, but real-world workloads are diverse.

Then there’s the response from incumbents. Nvidia could slash prices on older A100s or release a cut-down H200 for inference. AMD might repurpose its CDNA architecture for lower-power variants. Both have the resources to undercut or absorb competition.

Despite these uncertainties, Skymizer has proven a vital point: the next leap in AI hardware might not come from shrinking transistors but from rethinking how they’re used. Efficiency isn’t just a side benefit—it could become the main event.

Sources: TechRadar, The Verge

original report

Microsoft Lets Users Pause Windows Updates for 35

OpenAI’s Apology and the Tumbler Ridge Tragedy

Claude AI Plans Hiking Trip in 30 Minutes

Climate Tech’s Long-Awaited IPO Surge Begins

Contact Info

Some Populer Post

IKEA’s Budget-Friendly Outdoor Kitchen Solutions Impress

Enter Bob, IBM’s Friendly AI Coding Assistant

Wall Street sees ‘changing of the guard in AI’

Why Broken Chips Are Good Tech

Tiny AI Accelerator Challenges Nvidia, Steals AMD’s Thunder

Key Takeaways

Historical Context: Efficiency Was Once the Goal

Low-Power AI Accelerator: A Breakthrough in Efficiency

A Cost-Effective Solution

The Implications for Nvidia and AMD

What This Means For You

The Future of AI Acceleration

What Happens Next

Tagged:

The Wild West of AI Kids’ Toys

EU Rolls Back AI Restrictions in Provisional Deal

Topics

Company

About AI Post Daily