On May 23, 2026, the National Transportation Safety Board (NTSB) took the rare step of temporarily shutting down public access to its accident docket system — not because of a cyberattack or system failure, but because someone had used AI to resurrect the voices of pilots who died in a UPS plane crash last year. That’s not science fiction. It’s what happened when a spectrogram, a visual representation of sound data, was pulled from the docket for UPS Flight 2976 and fed into AI tools alongside the official transcript. The result? Audio approximations of the cockpit voice recorder — something federal law explicitly prohibits from being released.
Key Takeaways
- The NTSB removed public access to its docket on May 23, 2026, after AI-generated audio of deceased pilots circulated online.
- The source wasn’t a leaked recording — it was a spectrogram image from the cockpit voice recorder, legally posted in the docket.
- YouTuber Scott Manley flagged that audio could be reverse-engineered from the spectrogram, prompting public experimentation.
- Users used AI tools like Codex to reconstruct speech, showing how open data can become an unintended backdoor.
- This incident exposes a legal and technical blind spot: visual representations of audio aren’t protected like audio itself.
AI Voice Reconstruction Just Broke a Federal Barrier
It’s not illegal to publish a spectrogram. In fact, the NTSB has been doing it for years. The agency is barred by federal law from releasing actual cockpit voice recordings — the raw audio of pilots in their final moments. But spectrograms, which convert sound waves into colored pixel patterns across time and frequency, aren’t considered audio. They’re data. And until May 23, 2026, they were sitting in plain sight in the public docket.
That changed when Scott Manley, a physicist and YouTuber known for dissecting aerospace tech through simulations and commentary, posted on X: “You could probably reconstruct the audio from that spectrogram. It’s just math.” He wasn’t issuing a challenge. He was stating a technical fact. But the internet treats facts like invitations.
Within days, developers and AI hobbyists did exactly that. Using machine learning models trained on voice patterns, speech cadence, and phoneme mapping, they reverse-engineered the visual data. Paired with the official transcript — also public — they fed sequences into tools like Codex, fine-tuned it with prosody models, and generated audio that mimicked what the pilots said. Not perfectly. Not with studio fidelity. But close enough to be chilling.
The NTSB didn’t confirm the accuracy of the reconstructions. It didn’t need to. The mere existence of audio labeled as “CVR from UPS 2976” — a recording that, by law, should never exist outside a secure facility — was enough to trigger a shutdown. Public trust in the agency’s control over sensitive data was already thin. This cracked it.
The Spectrogram Was Never Meant to Be a Backdoor
For decades, spectrograms have been a forensic tool — a way for analysts to visually inspect anomalies in audio, like clipped words or overlapping voices. They’re standard in NTSB reports because they’re descriptive, not experiential. A graph doesn’t evoke emotion like a scream in a cockpit would.
But in 2026, that line has collapsed. The same AI models that can turn text into lifelike speech can also go backward — from image to sound. And spectrograms, with their precise frequency-time mapping, are ideal inputs. They’re not degraded like old analog tapes. They’re high-resolution, machine-readable, and mathematically reversible.
What’s worse: the docket didn’t just include one spectrogram. It included megabytes of them — frame after frame of visualized audio. That’s not a data leak. It’s a data buffet.
The NTSB has published spectrograms in past reports — including for Asiana Flight 214 in 2013 and Colgan Air 3407 in 2009 — without incident. Back then, reconstructing audio from an image wasn’t feasible. The computational cost was too high. The models didn’t exist. The idea was more theoretical than practical. But between 2020 and 2025, open-source voice synthesis tools improved exponentially. Projects like Mozilla TTS, Coqui, and Meta’s Voicebox made speech generation accessible to anyone with a GPU. Reverse engineering audio from visual representations went from impossible to trivial in less than five years.
In 2023, researchers at Carnegie Mellon demonstrated a prototype that could extract low-fidelity speech from silent video using lip movements. In 2024, a team in Seoul used AI to reconstruct audio from vibrations in a potato chip bag filmed through a soundproof window. These experiments were confined to academic circles. But they proved a principle: if sound leaves a trace — any trace — AI can exploit it.
The NTSB wasn’t monitoring these developments. It didn’t have to. Its mandate is safety, not digital forensics. But that separation no longer holds. Every public data release now exists in a world where reconstruction is possible. And the agency’s long-standing practice of transparency has become a vulnerability.
How the Reconstruction Worked
Here’s how it played out, based on public GitHub repos and forum posts:
- Scrape the spectrogram PNGs from the NTSB docket (public until May 23, 2026).
- Use OpenCV to extract pixel intensity data across time and frequency bands.
- Map intensity to amplitude, reverse the Fourier transform, and reconstruct a raw audio waveform.
- Feed that audio, along with the transcript, into a speech synthesis model like a fine-tuned version of Codex or VALL-E.
- Adjust for speaker gender, accent, and emotional state using metadata from pilot records.
The result wasn’t a perfect clone. But it was enough to generate sentences like “We’re not gonna make it” in a voice that sounded like one of the pilots. And once that audio hit Reddit and X, it spread like a rumor with receipts.
One reconstruction, shared in a now-removed GitHub repo, used the transcript to align phoneme sequences with the spectrogram’s time axis. The model was trained on hours of commercial pilot communications from public ATC recordings. It learned the cadence, formality, and stress patterns common in cockpit speech. Then, using the raw waveform from the spectrogram as a guide, it generated a voice that matched the expected emotional arc — calm at first, then strained, then urgent.
Another version relied on VALL-E’s zero-shot voice cloning. It didn’t need samples of the pilots’ voices. Instead, it used demographic data — age, gender, region — to simulate a “typical” pilot voice for that profile. The result wasn’t an exact match, but to casual listeners, it was indistinguishable from real CVR audio.
Why the NTSB Can’t Just Redact and Move On
Redacting the spectrograms now won’t fix the problem. They were public for months. Screenshots, downloads, and reuploads are everywhere. And even if the NTSB removed every file, the method has been demonstrated. The next crash investigation with a spectrogram in the docket will face the same risk.
Worse, the agency’s hands are tied. Federal law prohibits the release of cockpit voice recordings, but it says nothing about derived audio. There’s no precedent for prosecuting someone for reconstructing a recording that was never released. That’s a legal gray hole — and it’s wide open.
The NTSB can update its data policies. It can stop publishing spectrograms. But it can’t control how other agencies or third parties handle similar data. The Federal Aviation Administration (FAA), for example, shares technical telemetry in public reports. The National Oceanic and Atmospheric Administration (NOAA) releases sonar data. The Department of Energy publishes seismic readings. All of these contain potential audio traces.
If spectrograms can be reverse-engineered, what about sonograms from underwater recordings? Or vibration data from structural sensors? Or even thermal camera footage that captures subtle movements tied to sound? The same AI tools that rebuilt pilot voices can be applied to any data stream that correlates with acoustic activity.
This Wasn’t a Hack. It Was a Design Flaw.
There’s no evidence of malicious intent in the initial reconstruction. No ransom demand. No doxxing. No harassment of families. This wasn’t a cyberattack — it was a proof of concept gone viral. And that’s what makes it so dangerous.
Because it wasn’t a breach. It was compliance. The data was published legally. The tools used were publicly available. The methods were shared in open forums. This wasn’t an exploit. It was a feature.
And it reveals a deeper truth: as AI erodes the line between data and media, every technical decision has ethical consequences. The NTSB thought they were being transparent. Instead, they were being mined.
What This Means For You
If you’re building AI models that handle voice, audio, or visual data, this should scare you. You can’t assume that non-audio formats are safe to share. A spectrogram, a waveform plot, a LiDAR point cloud — any data that’s mathematically reversible could become a backdoor to sensitive content. And once it’s out, you can’t un-know it.
For developers, the lesson is clear: data provenance and format policy need to be part of your threat model. Just because something isn’t audio doesn’t mean it can’t become audio. And just because a file is public doesn’t mean it should be. Build filters. Add access controls. Assume reconstruction is possible — because it is.
Transparency is still important. But in the age of AI, it has to be designed with reconstruction risk in mind. The NTSB didn’t anticipate this. You don’t have that excuse.
Consider a startup building a healthcare AI that analyzes patient breathing patterns using audio-derived sensor data. Publishing anonymized spectrograms for research could seem harmless. But if those visuals contain enough resolution to reconstruct speech, they might reveal private conversations — even if no audio was ever stored. That’s not a bug. It’s a systemic oversight.
Or imagine a city releasing traffic camera footage with audio disabled, thinking privacy is preserved. But if the video captures lip movements or vibrations in car windows, AI can still extract what was said. The data isn’t audio — but it becomes audio in the right model.
Even defense contractors aren’t immune. A report detailing radar return signals might include visual plots of echo patterns. If those correlate with engine noise or radio chatter, they could be reverse-engineered to extract classified communications. The data is “clean.” The outcome isn’t.
What Happens Next
The NTSB is expected to release revised data policies by mid-2026. Early signals suggest they’ll stop publishing spectrograms in public dockets. Internal working groups are exploring alternatives — like low-resolution versions that preserve investigative value but lack reconstruction fidelity. But there’s no guarantee these will hold.
Congress may step in. Lawmakers have already requested briefings from the NTSB and the Department of Transportation. The legal question is whether reconstructed audio qualifies as a “recorded communication” under existing statutes. If not, new legislation may be needed to close the gap. But drafting laws for AI-derived media is uncharted territory.
Meanwhile, the AI community faces its own reckoning. Model developers didn’t build tools to resurrect the dead. But they enabled it. Should speech synthesis models come with usage restrictions? Should platforms ban uploads of reconstructed CVR audio? These aren’t hypotheticals. They’re immediate policy questions.
One thing’s certain: this won’t be the last time open data is repurposed in ways its publishers never intended. The tools exist. The data exists. The only variable is time.
Sources: TechCrunch, original report

