Last winter, during the construction of an affordable housing project on Martha’s Vineyard, Massachusetts, a 32-year-old worker named Jose Luis Collaguazo Crespo slipped off a ladder on the second floor and plunged to his death in the basement. He was one of more than 1,000 construction workers who die on the job each year in the US, making it the most dangerous industry for fatal slips, trips, and falls.
“Everyone talks about [how] ‘safety is the number-one priority,’” entrepreneur and executive Philip Lorenzo said during a presentation at Construction Innovation Day 2025, a conference at the University of California, Berkeley, in April. “But then maybe internally, it’s not that high priority. People take shortcuts on job sites. And so there’s this whole tug-of-war between … safety and productivity.”
To combat the shortcuts and risk-taking, Lorenzo is working on a tool for the San Francisco–based company DroneDeploy, which sells software that creates daily digital models of work progress from videos and images, known in the trade as “reality capture.” The tool, called Safety AI, analyzes each day’s reality capture imagery and flags conditions that violate Occupational Safety and Health Administration (OSHA) rules, with what he claims is 95% accuracy. That means that for any safety risk the software flags, there is 95% certainty that the flag is accurate and relates to a specific OSHA regulation. Launched in October 2024, it’s now being deployed on hundreds of construction sites in the US, Lorenzo says, and versions specific to the building regulations in countries including Canada, the UK, South Korea, and Australia have also been deployed.
Safety AI is one of multiple AI construction safety tools that have emerged in recent years, from Silicon Valley to Hong Kong to Jerusalem. Many of these rely on teams of human “clickers,” often in low-wage countries, to manually draw bounding boxes around images of key objects like ladders, in order to label large volumes of data to train an algorithm. Lorenzo says Safety AI is the first one to use generative AI to flag safety violations, which means an algorithm that can do more than recognize objects such as ladders or hard hats. The software can “reason” about what is going on in an image of a site and draw a conclusion about whether there is an OSHA violation. This is a more advanced form of analysis than the object detection that is the current industry standard, Lorenzo claims. But as the 95% success rate suggests, Safety AI is not a flawless and all-knowing intelligence. It requires an experienced safety inspector as an overseer.
A visual language model in the real world
Robots and AI tend to thrive in controlled, largely static environments, like factory floors or shipping terminals. But construction sites are, by definition, changing a little bit every day.
Lorenzo thinks he’s built a better way to monitor sites, using a type of generative AI called a visual language model, or VLM. A VLM is an LLM with a vision encoder, allowing it to “see” images of the world and analyze what is going on in the scene.
Using years of reality capture imagery gathered from customers, with their explicit permission, Lorenzo’s team has assembled what he calls a “golden data set” encompassing tens of thousands of images of OSHA violations. Having carefully stockpiled this specific data for years, he is not worried that even a billion-dollar tech giant will be able to “copy and crush” him.
To help train the model, Lorenzo has a smaller team of construction safety pros ask strategic questions of the AI. The trainers input test scenes from the golden data set to the VLM and ask questions that guide the model through the process of breaking down the scene and analyzing it step by step the way an experienced human would. If the VLM doesn’t generate the correct response—for example, it misses a violation or registers a false positive—the human trainers go back and tweak the prompts or inputs. Lorenzo says that rather than simply learning to recognize objects, the VLM is taught “how to think in a certain way,” which means it can draw subtle conclusions about what is happening in an image.

As an example, Lorenzo says VLMs are much better than older methods at analyzing ladder usage, which is responsible for 24% of the fall deaths in the construction industry.
“With traditional machine learning, it’s very difficult to answer the question of ‘Is a person using a ladder unsafely?’” says Lorenzo. “You can find the ladders. You can find the people. But to logically reason and say ‘Well, that person is fine’ or ‘Oh no, that person’s standing on the top step’—only the VLM can logically reason and then be like, ‘All right, it’s unsafe. And here’s the OSHA reference that says you can’t be on the top rung.’”
Answers to multiple questions (Does the person on the ladder have three points of contact? Are they using the ladder as stilts to move around?) are combined to determine whether the ladder in the picture is being used safely. “Our system has over a dozen layers of questioning just to get to that answer,” Lorenzo says. DroneDeploy has not publicly released its data for review, but he says he hopes to have his methodology independently audited by safety experts.
The missing 5%
Using vision language models for construction AI shows promise, but there are “some pretty fundamental issues” to resolve, including hallucinations and the problem of edge cases, those anomalous hazards for which the VLM hasn’t trained, says Chen Feng. He leads New York University’s AI4CE lab, which develops technologies for 3D mapping and scene understanding in construction robotics and other areas. “Ninety-five percent is encouraging—but how do we fix that remaining 5%?” he asks of Safety AI’s success rate. Feng points to a 2024 paper called “Eyes Wide Shut?”—written by Shengbang Tong, a PhD student at NYU, and coauthored by AI luminary Yann LeCun—that noted “systematic shortcomings” in VLMs. “For object detection, they can reach human-level performance pretty well,” Feng says. “However, for more complicated things—these capabilities are still to be improved.” He notes that VLMs have struggled to interpret 3D scene structure from 2D images, don’t have good situational awareness in reasoning about spatial relationships, and often lack “common sense” about visual scenes.
Lorenzo concedes that there are “some major flaws” with LLMs and that they struggle with spatial reasoning. So Safety AI also employs some older machine-learning methods to help create spatial models of construction sites. These methods include the segmentation of images into crucial components and photogrammetry, an established technique for creating a 3D digital model from a 2D image. Safety AI has also trained heavily in 10 different problem areas, including ladder usage, to anticipate the most common violations.
Even so, Lorenzo admits there are edge cases that the LLM will fail to recognize. But he notes that for overworked safety managers, who are often responsible for as many as 15 sites at once, having an extra set of digital “eyes” is still an improvement.
Aaron Tan, a concrete project manager based in the San Francisco Bay Area, says that a tool like Safety AI could be helpful for these overextended safety managers, who will save a lot of time if they can get an emailed alert rather than having to make a two-hour drive to visit a site in person. And if the software can demonstrate that it is helping keep people safe, he thinks workers will eventually embrace it.
However, Tan notes that workers also fear that these types of tools will be “bossware” used to get them in trouble. “At my last company, we implemented cameras [as] a security system. And the guys didn’t like that,” he says. “They were like, ‘Oh, Big Brother. You guys are always watching me—I have no privacy.’”
Older doesn’t mean obsolete
Izhak Paz, CEO of a Jerusalem-based company called Safeguard AI, has considered incorporating VLMs, but he has stuck with the older machine-learning paradigm because he considers it more reliable. The “old computer vision” based on machine learning “is still better, because it’s hybrid between the machine itself and human intervention on dealing with deviation,” he says. To train the algorithm on a new category of danger, his team aggregates a large volume of labeled footage related to the specific hazard and then optimizes the algorithm by trimming false positives and false negatives. The process can take anywhere from weeks to over six months, Paz says. With training completed, Safeguard AI performs a risk assessment to identify potential hazards on the site. It can “see” the site in real time by accessing footage from any nearby internet-connected camera. Then it uses an AI agent to push instructions on what to do next to the site managers’ mobile devices. Paz declines to give a precise price tag, but he says his product is affordable only for builders at the “mid-market” level and above, specifically those managing multiple sites. The tool is in use at roughly 3,500 sites in Israel, the United States, and Brazil.
Buildots, a company based in Tel Aviv that MIT Technology Review profiled back in 2020, doesn’t do safety analysis but instead creates once- or twice-weekly visual progress reports of sites. Buildots also uses the older method of machine learning with labeled training data. “Our system needs to be 99%—we cannot have any hallucinations,” says CEO Roy Danon. He says that gaining labeled training data is actually much easier than it was when he and his cofounders began the project in 2018, since gathering video footage of sites means that each object, such as a socket, might be captured and then labeled in many different frames. But the tool is high-end—about 50 builders, most with revenue over $250 million, are using Buildots in Europe, the Middle East, Africa, Canada, and the US. It’s been used on over 300 projects so far.
Ryan Calo, a specialist in robotics and AI law at the University of Washington, likes the idea of AI for construction safety. Since experienced safety managers are already spread thin in construction, however, Calo worries that builders will be tempted to automate humans out of the safety process entirely. “I think AI and drones for spotting safety problems that would otherwise kill workers is super smart,” he says. “So long as it’s verified by a person.”
Andrew Rosenblum is a freelance tech journalist based in Oakland, CA.