The rules governing the use of online data remain murky, with a court dismissing a case brought by X (formerly Twitter), in which X claimed that a company named Bright Data had stolen and utilized user info, in violation of X’s terms.
Bright Data gathers publicly accessible information from the web, then uses it in its offering, and recently won a similar case against Meta for taking Facebook and Instagram user info as well.
Bright Data maintains that it only scrapes information that’s publicly accessible without a login. But X claimed that the company not only sells user data without permission, but that it had also been “using elaborate technical measures to evade X Corp.’s anti-scraping technology.”
X claimed that Bright Data was breaching both its own terms of service and copyright, but Federal Court Judge William Alsup dismissed X’s claim, which means that Bright Data is now free to continue using social media user data, within certain limits.
According to Judge Alsup, X’s claim is circumstantial, and is not, as X had indicated, in defense of user privacy. Judge Alsup noted that X is happy to sell user info for a price, but that it was only seeking to stop Bright Data in this instance because it was evading those fees.
Data scraping from social media profiles has been the subject of much legal debate, due to the technicalities around who owns such data, and how it can then be used.
Under current law, publicly accessible content is not subject to general copyright, especially when the claim is being made by the platform and not the individual. In the case of platforms, they benefit from making a certain amount of their user posts available to all, but over time, most have locked down more and more of that info in order to stop scrapers from gathering up their user data, and then repackaging and/or reusing it in other forms.
That’s become even more pressing in the age of large language models (LLMs) which power AI systems. AI companies need to get their data from somewhere, and most social apps are now working to lockdown and protect their data, in order to stop AI projects from sucking it up.
But as yet, there’s no legal precedent that stops the reuse of publicly accessible social platform info.
It did seem that such precedent was coming, after LinkedIn won a five year legal battle against professional services company hiQ Labs back in 2022. hiQ Labs had been using LinkedIn member data to build its own employee information service, and LinkedIn was eventually allowed to block hiQ’s access under legal challenge. But as noted, Meta attempted similar legal enforcement against Bright Data, and was rejected by the courts in January this year. Meta then decided to abandon the case.
The technicality here seems to relate to what data is accessed, and how the scrapers operate. If it’s publicly available without a login, the law seems to side with the scrapers, as this info is not being protected by the platforms, and is not technically owned by them, as such.
But if it’s accessed via a logged in user, that’s considered proprietary, and thus, enforceable by the law.
The end result will likely be that more content gets locked down, and hidden to non-users. Yet, at the same time, platforms like X, in particular, benefit greatly from having their posts displayed in Google Search results, which can only happen if they remain publicly visible.
It’s a difficult quandary, but you can bet that every social app is now working out how to keep others away from their data stores, as more and more AI projects look for conversational data sources, and the law offers limited protection against such use.