Bright Needed Legal Threats to Stop AI Scrapers. Its Leaders Think the Future Will Be Collaborative

Late last year, Bright MLS General Counsel Brian Schneider was doing what millions of people do every day—asking AI large language models (LLMs) like ChatGPT, Claude and Google Gemini for information on real estate listings.

But Schneider’s motives were very different from the average consumer. As part of a regular review process, Schneider was checking to see how this frontier technology utilized or presented the valuable, copyrighted data Bright curates and syndicates for members.

“Something changed in the fall,” Schneider tells RISMedia.

While LLMs have long provided basic data on for-sale properties—price, addresses and links to where the user could see the listing—suddenly these models were offering multiple listing photos and other clearly copyrighted material. Schneider says he immediately took action and reached out to the AI giants, expecting a fight or at least some back-and-forth.

“Without any fanfare or settlement agreement or further push, they all fixed it, which was a really interesting and sort of eye-opening outcome. It was not really what we expected,” he says.

The friction highlights a new and fast-moving dynamic in real estate. As agents try to find ways to show up in AI inquiries or leverage those tools for their business, real estate entities like MLSs must be wary of LLMs scraping their valuable data—at least without guardrails or agreements.

As regulators and lawmakers scramble to keep up with the evolution of AI abilities, real estate professionals clearly need to remain actively engaged as these big tech companies push boundaries.

“We slapped their hand and they turned it off,” says Rajeev Sajja, Bright’s chief AI officer. “We want to build trust in the MLS data no matter how it’s used, and really have meaningful conversations with all entities, especially in the AI era.”

The data

Real estate’s current controversy over listings—private, pre-market or otherwise—is at its heart about data. LLMs, with the ever-growing capability to absorb and leverage that data, will certainly figure into both problems and solutions.

The relative ease with which Bright was able to dissuade AI companies from essentially stealing listings shows that the law creates some clear boundaries, even in the age of AI. That hopefully provides some assurance that these behemoth tech giants can’t just immediately get between the agents and brokerages who create listings and consumers looking for homes.

Schneider separates the copyright issue into three buckets that he cautions are “not mutually exclusive or collectively exhaustive.”

First, he says, LLMs using data for training is generally allowed by courts. Second, anything factual like property characteristics is also generally fair game. Thirdly, anything creative—photos, listing descriptions, video and floor plans—are off-limits, and cannot be scraped and reused by LLMs without permission.

“My suspicion is that the AI companies understand those three buckets as well, and recognized that they had crossed the line,” he says.

But there is still lots of wiggle room. Can an AI scrape MLSs (or portals) and provide real-time updates on things like status, pricing or days on market—potentially undermining the push by a large faction in real estate to do away with “negative insights?”

Theoretically yes, but Schneider notes there is currently a “debate” whether those items count as creative elements.

“I think there’s arguments to be made that they are,” he says.

Either way, there is clearly a need to monitor how AI is utilizing data. Schneider says he still regularly checks up on what is appearing both on the LLMs, as well as the larger “gray market” of real estate data largely using straightforward methods and prompts. Bright plans on expanding automation and leveraging AI tools to improve that process, according to Schneider, though that is still in the early stages.

The next steps

LLMs are clearly still making changes. RISMedia prompted three LLMs on May 22 (OpenAI’s ChatGPT, Google’s Gemini Flash and xAI’s Grok) asking for listing descriptions and photos of specific properties. Grok did provide listing descriptions, while ChatGPT and Gemini refused, with a general error message saying it did not have access to that content.

Days later, with the same prompt, ChatGPT provided listing descriptions of multiple properties with no pushback, while Gemini continued to display an error message. Grok continued to show listing descriptions.

On June 2, ChatGPT refused to show any of the copyrighted material, and explicitly stated that it was prevented from doing so due to copyright law. Gemini initially displayed the generic error message, but then provided all the listing photos of the properties (with MLS watermarks) and listing descriptions while simultaneously saying it “cannot directly access or output the copyrighted, real-world reference photograph from the web.” Grok continued to show listing descriptions.

While enforcing copyright is clearly something everyone from the photographer to the agent to the portals is invested in, what are some of the future opportunities as real estate and AI begin (hopefully) to work more collaboratively?

Sajja says that right now, agents need to have some “fundamentals” in place to get noticed by AI, focused on content signals with their own online footprint. But what can MLSs and the entities that actually control real estate data provide through partnerships?

Sajja adds that “the conversation needs to slowly shift” toward how to appropriately license data to AI. Bright specifically is focused on the “right licensing,” he adds, and teases that the MLS will have “something to communicate to all the people that get the feeds” in the next three months or so.

Future state

How do MLSs make sure agents are served by this process? Asked if an MLS could, for instance, license its data to a company so an AI model can better answer consumer questions like “Who was the top agent in my city this year,” Sajja says he “doesn’t see any reason why we can’t.”

Bright further wants to be “the intelligence layer” for AI, Sajja affirms, supporting both agents and consumers directly. Sajja points out that big LLMs often hallucinate—providing completely false answers or numbers pulled out of thin air—and that connecting directly to Bright’s data through an agent is how to be accurate, and is “really the goal” for Bright.

Schneider adds that at a higher level, he is anticipating the industry moving toward “an opt-in model,” where, by default, MLSs, brokers and anyone else who controls or licenses real estate data does not share with everyone. Instead, these entities will carefully choose how and with whom they want to share data.

“We’re seeing a theme of brokers—and I truly mean plural, brokers big and small—really rethinking the value of the data that they’ve created through their listing and trying to evaluate the best use of that for their client and their company,” he says.

The fractured nature of the MLS industry means the process for this is somewhat messy. Likely, most LLMs or other big tech companies will want comprehensive data across regions (or the entire country). None of them want to negotiate hundreds of deals with each individual MLS that holds a piece of the puzzle, Schneider says.

But there are already companies solving this—HouseCanary, which Sajja notes “has this one conduit to Google” that allows it to go out to MLSs and negotiate individual deals that (theoretically) can benefit everyone. Sajja says that Bright is open to those deals “if it’s going to empower brokers and agents.”

Schneider says he believes this is likely going to happen again with big AI companies, and each MLS or brokerage will have options to license their data out through pre-negotiated arrangements, either through intermediaries or “industry consortium” projects like Bright’s own REdistribute.

Many real estate professionals still decry how MLSs and brokerages allowed outside entities to control the flow of information and property data, monetizing both consumers and agents to great effect (and profit). Real estate incumbents, including the MLS industry and big brokerages, did not see the power or inevitability of how the internet would change both consumer behavior and the value of data.

Sajja says today, Bright believes that the path forward is to get onboard with how data is used and disseminated in the AI era—sooner rather than later.

“One of the things that at Bright, what we wanted to focus on was to spur innovation responsibly, not just be hoarding the data, but actually managing broker’s data is a very big asset,” he says. “I think this train is moving, and our goal is to actually just give it the right fuel responsibly.”