An international team of researchers has released an artificial intelligence system capable of autonomously conducting scientific research across multiple disciplines — generating papers from initial concept to publication-ready manuscript in approximately 30 minutes for about $4 each.
The system, called Denario, can formulate research ideas, review existing literature, develop methodologies, write and execute code, create visualizations, and draft complete academic papers. In a demonstration of its versatility, the team used Denario to generate papers spanning astrophysics, biology, chemistry, medicine, neuroscience, and other fields, with one AI-generated paper already accepted for publication at an academic conference.
“The goal of Denario is not to automate science, but to develop a research assistant that can accelerate scientific discovery,” the researchers wrote in a paper released Monday describing the system. The team is making the software publicly available as an open-source tool.
This achievement marks a turning point in the application of large language models to scientific work, potentially transforming how researchers approach early-stage investigations and literature reviews. However, the research also highlights substantial limitations and raises pressing questions about validation, authorship, and the changing nature of scientific labor.
From data to draft: how AI agents collaborate to conduct research
At its core, Denario operates not as a single AI brain but as a digital research department where specialized AI agents collaborate to push a project from conception to completion. The process can begin with the “Idea Module,” which employs a fascinating adversarial process where an “Idea Maker” agent proposes research projects that are then scrutinized by an “Idea Hater” agent, which critiques them for feasibility and scientific value. This iterative loop refines raw concepts into robust research directions.
Once a hypothesis is solidified, a “Literature Module” scours academic databases like Semantic Scholar to check the idea’s novelty, followed by a “Methodology Module” that lays out a detailed, step-by-step research plan. The heavy lifting is then done by the “Analysis Module,” a virtual workhorse that writes, debugs, and executes its own Python code to analyze data, generate plots, and summarize findings. Finally, the “Paper Module” takes the resulting data and plots and drafts a complete scientific paper in LaTeX, the standard for many scientific fields. In a final, recursive step, a “Review Module” can even act as an AI peer-reviewer, providing a critical report on the generated paper’s strengths and weaknesses.
This modular design allows a human researcher to intervene at any stage, providing their own idea or methodology, or to simply use Denario as an end-to-end autonomous system. “The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis,” the paper explains.
To validate its capabilities, the Denario team has put the system to the test, generating a vast repository of papers across numerous disciplines. In a striking proof of concept, one paper fully generated by Denario was accepted for publication at the Agents4Science 2025 conference — a peer-reviewed venue where AI systems themselves are the primary authors. The paper, titled “QITT-Enhanced Multi-Scale Substructure Analysis with Learned Topological Embeddings for Cosmological Parameter Estimation from Dark Matter Halo Merger Trees,” successfully combined complex ideas from quantum physics, machine learning, and cosmology to analyze simulation data.
The ghost in the machine: AI’s ‘vacuous’ results and ethical alarms
While the successes are notable, the research paper is refreshingly candid about Denario’s significant limitations and failure modes. The authors stress that the system currently “behaves more like a good undergraduate or early graduate student rather than a full professor in terms of big picture, connecting results…etc.” This honesty provides a crucial reality check in a field often dominated by hype.
The paper dedicates entire sections to “Failure Modes” and “Ethical Implications,” a level of transparency that enterprise leaders should note. The authors report that in one instance, the system “hallucinated an entire paper without implementing the necessary numerical solver,” inventing results to fit a plausible narrative. In another test on a pure mathematics problem, the AI produced text that had the form of a mathematical proof but was, in the authors’ words, “mathematically vacuous.”
These failures underscore a critical point for any organization looking to deploy agentic AI: the systems can be brittle and are prone to confident-sounding errors that require expert human oversight. The Denario paper serves as a vital case study in the importance of keeping a human in the loop for validation and critical assessment.
The authors also confront the profound ethical questions raised by their creation. They warn that “AI agents could be used to quickly flood the scientific literature with claims driven by a particular political agenda or specific commercial or economic interests.” They also touch on the “Turing Trap,” a phenomenon where the goal becomes mimicking human intelligence rather than augmenting it, potentially leading to a “homogenization” of research that stifles true, paradigm-shifting innovation.
An open-source co-pilot for the world’s labs
Denario is not just a theoretical exercise locked away in an academic lab. The entire system is open-source under a GPL-3.0 license and is accessible to the broader community. The main project and its graphical user interface, DenarioApp, are available on GitHub, with installation managed via standard Python tools. For enterprise environments focused on reproducibility and scalability, the project also provides official Docker images. A public demo hosted on Hugging Face Spaces allows anyone to experiment with its capabilities.
For now, Denario remains what its creators call a powerful assistant, but not a replacement for the seasoned intuition of a human expert. This framing is deliberate. The Denario project is less about creating an automated scientist and more about building the ultimate co-pilot, one designed to handle the tedious and time-consuming aspects of modern research.
By handing off the grueling work of coding, debugging, and initial drafting to an AI agent, the system promises to free up human researchers for the one task it cannot automate: the deep, critical thinking required to ask the right questions in the first place.

