Fable 5 achieved a 16.1% automation rate on the Center for AI Safety’s Remote Labor Index, setting a new benchmark for AI‑driven freelance work. That number’s higher than the previous high‑water mark by more than double, and it’s the first time any model has cracked the mid‑teens on the metric. It’s a striking data point, but it also reminds us that the field’s still in its infancy.
Key Takeaways
- Fable 5’s automation rate sits at 16.1%, beating Opus 4.8 (8.3%) and GPT‑5.5 (6.3%).
- The automation frontier has quadrupled in under eight months.
- Even a worst‑case scenario would leave Fable 5 above 14.6%.
- Human evaluation of deliverables remains a bottleneck that AI agents can’t yet handle.
- Security concerns and integration hurdles mean AI won’t replace freelancers overnight.
Historical Context
The Remote Labor Index (RLI) was first published when the benchmark sat at a modest 2.5%. Early experiments showed that even the most capable language models struggled to meet professional standards without substantial human guidance. The first model to break the single‑digit barrier was Opus 4.6 paired with the Claude Cowork scaffold, which managed a 4.17% automation rate. That achievement felt like a proof‑of‑concept, hinting that agents could someday handle more than just text‑heavy tasks.
Since then, the field has been on a steady climb. Each new iteration of the index introduced more demanding project types—adding 3D modeling and video creation to the mix—while researchers refined the evaluation pipeline. The timeline is tight: within eight months, the top score rose from 4.17% to 16.1%. That acceleration reflects both improvements in model architecture and a sharper focus on “agentic” capabilities, meaning the ability to drive external software rather than simply generate prose.
AI freelance performance record set by Anthropic’s Fable 5
After a brief pause, Anthropic’s lauded Fable 5 model is back in the spotlight, and it’s resetting the bar for automating work. The U.S. government re‑authorized the model on June 30, a move that let the Center for AI Safety (CAIS) run its Remote Labor Index (RLI) tests before the model was pulled again in mid‑June. CAIS compared Fable 5 against OpenAI’s GPT‑5.5 and Anthropic’s own Opus 4.8, and the results were unmistakable: Fable 5 outperformed both by a wide margin.
How the Remote Labor Index works
RLI measures “how often AI agents can complete real, economically valuable freelance projects … at a quality a paying client would actually accept,” according to CAIS. The benchmark covers tasks like computer‑assisted graphic design, data analysis, video creation, and 3D modeling. Researchers fed each model a human‑generated input file—much like you’d give a freelancer a brief—and then let the agent produce a deliverable. Human evaluators then judged each output against professional standards, flagging anything that fell short. The automation rate reflects the share of projects where the AI’s work was deemed as good as or better than a human’s.
To keep the test fair, evaluators used the same suite of desktop applications that a freelance professional would. They opened the files, inspected layers, ran render checks, and compared the final output to the original brief. The process mimics a client’s real workflow: a designer uploads a Photoshop PSD, a data analyst opens a spreadsheet, a video editor renders a timeline. When the AI’s result passes that scrutiny, it earns a “pass” for that project.
Numbers that matter: automation rates across models
When CAIS ran the test, Fable 5 hit an automation rate of 16.1%, a record for the benchmark—and double the 8.3% that Opus 4.8 managed. GPT‑5.5 came in third at 6.3%, but even that was higher than any model CAIS had evaluated before. For context, the previous leader sat at 4.17% (Opus 4.6 with the Claude Cowork scaffold), and the field topped out at 2.5% when RLI was released. “The frontier has more than quadrupled in under eight months, a concrete signal of how quickly economically capable AI agents are advancing,” CAIS said.
Fable 5’s 16.1% breakthrough
Even under the worst‑case assumption that Fable 5 failed every missing project, its automation rate would still be 14.6%, higher than any other model in the test set. That strongness suggests the model isn’t just lucky on a few easy tasks; it’s consistently strong across a range of freelance work. The model’s ability to design a 3D mockup of an engagement ring, create a video ad, and map a floor plan showed it can handle both visual and spatial reasoning, which many earlier agents struggled with.
Opus 4.8 and GPT‑5.5 lag behind
Opus 4.8’s 8.3% rate still represents a meaningful jump from its predecessor, but it’s clear that the gap between the two Anthropic models is widening. GPT‑5.5’s 6.3% score, while respectable, fell short of the “double‑digit” mark that CAIS highlighted as a turning point. The disparity underscores that not all large language models are created equal; architecture, training data, and especially the agentic layer that lets a model interact with software make a huge difference.
Why the jump matters – but isn’t a takeover
Sixteen percent isn’t anywhere close to 100%, so we shouldn’t start writing e‑obits for freelance talent just yet. Even where AI shines, organizations still wrestle with security concerns, compliance hoops, and the need for human oversight on budget and timeline. Replacing a human freelancer would likely require a network of agents that double‑check each other’s work, and that adds complexity rather than simplifying it.
Human evaluation remains the bottleneck
CAIS even tried swapping the human evaluator for an “LLM judge” to see how far we could push the automation pipeline, but the model flopped. “Evaluating an RLI deliverable is itself a demanding, agentic task,” CAIS explained. “Doing it properly means opening the project’s files in the right professional applications, operating those applications competently, and forming a judgment the way a client would, the very computer‑use skills that today’s agents are still weakest at.” That quote drives home the point that the hardest part of many freelance gigs isn’t the creative output; it’s the tool‑familiarity and judgment that humans bring.
Practical implications for freelancers and builders
If you’re a freelancer, you’ll notice that the most immediate impact will be on narrow, repeatable tasks where AI can already meet client standards. Companies that have integrated AI agents into their pipelines might start shaving off a few hours of manual work, but they’ll still need humans to supervise, verify, and handle edge cases. For developers building AI‑enabled platforms, the takeaway is that investing in strong “computer‑use” modules—agents that can reliably launch Photoshop, run Excel macros, or edit video timelines—will be the next competitive moat.
What This Means For You
For developers, the data suggests that focusing on agentic capabilities—specifically, the ability to manipulate desktop applications—will pay off faster than chasing ever‑larger language models. If you can embed a reliable “tool‑use” layer, you’ll likely see your agents climb the automation ladder faster than the baseline 2.5% that the RLI started with. That means building tighter integrations, handling file I/O gracefully, and adding safety checks that keep the agent from crashing the host app.
Freelancers, on the other hand, should start positioning themselves as the “human‑in‑the‑loop” for high‑stakes projects. Highlight your expertise in quality assurance, client communication, and security compliance—areas where AI still trips up. By offering a hybrid service—AI‑augmented drafts plus your final polish—you can stay relevant even as automation rates creep upward.
Concrete scenarios for developers and founders
Imagine a SaaS startup that lets users generate marketing assets on demand. By wiring a Fable‑style agent into the backend, the platform could produce a first‑pass video ad in under ten minutes. The startup would still need a human reviewer to confirm brand guidelines, but the bulk of the heavy lifting—rendering, timing, basic copy—would be automated, cutting production costs by a noticeable margin.
A data‑analytics consultancy could deploy an agent that opens client spreadsheets, runs predefined macros, and outputs a cleaned data set. The consultant’s role would shift to interpreting results and advising on strategy, while the repetitive cleaning steps become almost invisible to the client. This use‑case uses the portion of the automation frontier that already exceeds 6%.
Finally, a freelance 3D artist might offer a “quick‑prototype” package where an AI drafts a low‑poly model based on client sketches. The artist then refines textures, lighting, and final geometry. The initial draft speeds up the project timeline, and the artist can charge a premium for the expert finish that the AI can’t provide.
Key Questions Remaining
Will future iterations of AI agents finally master the deep‑tool expertise that currently forces a human into the loop? Can safety mechanisms evolve enough to satisfy corporate compliance teams while still delivering speed gains? The RLI data tells us that progress is measurable, but it also highlights a clear ceiling: the part of the workflow that requires nuanced judgment and software fluency remains stubbornly human.
Answering those questions will shape the next wave of investment, research, and product strategy. For now, the 16.1% figure stands as both a milestone and a reminder that we’re still early in the journey toward fully autonomous freelance work.

