The Open Web Under Siege: How AI Concerns Threaten Free Information Access
The rise of AI is sparking a contentious debate within the internet's technical standards bodies, with some publishers and Big Tech companies advocating for changes that could fundamentally alter the open nature of the web. While concerns over resource strain and business model disruption are valid, proposed modifications to **Internet Engineering Task Force (IETF)** standards risk restricting legitimate automated access, impacting everything from journalistic research to archival efforts.
Automated access, commonly known as crawling or scraping, is a cornerstone of the free and open internet. It powers essential tools used by **journalists**, **researchers**, and **watchdog organizations** to report news, identify security flaws, and investigate discrimination. Non-profits like the **Internet Archive** rely on crawling to preserve historical copies of websites, while comparison shopping tools empower consumers to find the best deals.
### The AI Dilemma: Revenue vs. Openness
However, this open access is increasingly under threat. Publishers and large technology companies, fearing lost advertising and licensing revenues, are pushing to restrict bots that crawl public web content for AI model training or operation. Some are even attempting to embed their business models directly into internet standards by influencing **IETF** technical specifications.
These economic anxieties are understandable. AI bots can strain website infrastructure, potentially degrading performance or taking sites offline. Upgrading systems incurs costs that some smaller sites may struggle to bear. Furthermore, AI overviews could disrupt traditional publishing business models if users bypass source websites in favor of AI-generated summaries.
### A Dangerous Precedent for Internet Standards
Despite these legitimate concerns, the proposed solution of altering **IETF** standards from neutral protocols that encourage openness to restrictive, monetized requirements is highly problematic. The worst of these proposals could give websites unprecedented power to automatically block legitimate and lawful scraping and crawling.
One such initiative is the **AI Preferences** working group, which is developing proposals to allow publishers to express βpreference signalsβ against crawling web data for AI-related purposes. These signals, expressed via `robots.txt`, could potentially become legally binding in certain jurisdictions.
### The Threat of Cryptographic Bot Identification
Another working group, **Web Bot Auth**, has a dual agenda. While it aims to protect sites from overly aggressive bots that strain resources β a positive goal β it is also pursuing a more dangerous path. This includes changes that would enable sites to cryptographically identify bots, allowing them to block virtually anyone they choose. This isn't limited to βbadβ actors; it could extend to competitors, dissidents, or any entity unwilling to pay for automated access.
If crawling is restricted to a preapproved list of cryptographically authenticated bots, websites could demand licensing payments. This would effectively close off the open web to **researchers**, **archivists**, and **startups** without the financial means to pay for automated access.
### Balancing Interests for a Sustainable Future
While websites have valid reasons to be concerned about AIβs impact on traffic and revenue, these concerns must be weighed against the immense benefits of the open web. The proposed changes risk granting website operators veto power over a wide range of crucial uses. This includes the aforementioned investigations and archival work, accessibility tools for people with disabilities, and research efforts holding governments accountable.
Organizations like the **EFF** and their allies are actively resisting these threats to open access, fighting to protect the open web from efforts to manipulate internet standards and undermine the right to freely access information, including through automated tools.