AI-generated content and other unfavorable practices have put longtime staple CNET on Wikipedia’s blacklisted sources

In the wave of AI controversies and lawsuits, CNET has been publicly admonished since it first started posting thinly-veiled AI-generated content on its site in late 2022— a scandal that has culminated in the site being demoted from Trusted to Untrusted Sources on Wikipedia [h/t Futurism].

Considering that CNET has been in the business since 1994 and maintained a top-tier reputation on Wikipedia up until late 2020, this change came after lots of debate between Wikipedia’s editors and has drawn the attention of many in the media, including some CNET staff members.

It’s important to remember that while Wikipedia is “The Free Encyclopedia that Anyone Can Edit,” it’s hardly The Wild West. Wikipedia’s community of editors and volunteers demand citations for any information added to Wiki pages, which keeps some degree of accountability behind the massive community responsible for operating Wikipedia. While it should never be used as a primary source, Wikipedia tends to be at least an excellent place to start researching those topics due to those citation requirements.

CNET’s (apparent) fall on Wikipedia started before its AI-generated content was discovered. Back in October 2020, the acquirement of CNET by publisher Red Ventures began pushing CNET down at Wikipedia since evidence seemed to indicate a drop in editorial standards and more advertiser-favored content.

But following November 2023, when Red Ventures began posting AI-generated content to what used to be one of the most reputable tech sites, Wikipedia editors nearly immediately started pushing to demote CNET entirely from Wikipedia’s list of reliable sources. CNET claims to have stopped posting AI-generated content since the nature of Red Ventures’ ruthless pursuit of capital and posting of misinformation on other owned sites (like Healthline) has kept CNET off the current list of reliable sources.

One Wikipedia editor, Chess, was quoted in the Futurism piece as saying, “We shouldn’t repeatedly put the onus on editors to prove that Red Ventures ruined a site before we can start removing it; they can easily buy or start another. I think we should look at the common denominator here, which is Red Ventures, and target the problem (a spam network) at its source.”

This is a genuinely scalding take, but it might just be warranted. The issue here isn’t purely the concealed usage of generative AI in published articles on one of the most well-known tech news sites ever. Instead, it’s the fact that those AI-generated articles tend to be poorly written and inaccurate.

Before the age of AI, Wikipedia editors already had to deal with unwanted auto-generated content in the form of spambots and malicious actors. In this way, editors’ treatment of AI-generated content is remarkably consistent with their past policy: it is just spam, isn’t it?

In a related story a few months ago, a self-described “SEO heist” was discovered on Twitter. This may have gone undiscovered had the person responsible not openly boasted about the “achievement,” which involved looking at a competitor’s site, running it all through AI, and immediately AI generating an entire competing website with 1800 articles targeting the same niche to “steal 3.6M total traffic from a competitor”.

The site that was hurt by this so-called SEO heist is called Exceljet, a site run by Excel expert David Bruns to help others better use Excel. Besides having his hard work stolen in perhaps the sleaziest, laziest manner possible, Bruns also discovered that most of that content was inaccurate. Hubspot’s coverage of that story also discusses how, fortunately, Google eventually caught onto this.

Unfortunately, the rise of generative AI is also starting to come at the detriment of a usable Internet with content written by humans capable of testing and genuinely understanding things. One can only hope stories like this help discourage publishers from tossing aside quality control to the point where they’re auto-generating misleading content.

Particularly when we also consider that lawsuits like The New York Times v. OpenAI and Microsoft remind us that these so-called generative AIs are pretty much required to steal other people’s work to function at all. At least when a regular thief steals an object, it still works. With generative AI, you can’t even guarantee that the result will be accurate, especially if you already lack the expertise to tell the difference.

Source link