Thinking About the Internet and Business Models - Why Advertising Ruined the World and the Alternative

Sunday, September 15th 2024

The Internet, built on free protocols TCP/IP and HTTP specifically enabled huge companies to be built and traditional businesses to be disrupted.

These two protocols drove distribution costs of digital products down to zero.

Which meant that the big winners were those who could acquire and retain as many eyeballs as possible.

These eyeballs would then transact with zero friction on core human behaviors: buy things, search for information, and connect with people.

When scaled, you can have a single vendor provide the JTBD for billions of people. Specifically, Amazon, Google, and Facebook.

In addition to these three, there have also been businesses that rely on the user-generation of data that have thrived under a primary business model: advertising.

The three together drive 64% of digital ad spend, as of 2021.

This makes sense for a protocol which is designed for the creation, delivery, and consumption of content. The information highway is fueled and paid for by endless billboards.

Were there exceptions to this?

Yes, and I'll address that later; but we can't escape that when so much is driven towards "free" advertising has been the best business model.

It's margins are huge: consumers and users product the product; intermediaries serve precision ads; and the TAM expanded beyond global brands and clients of traditional advertising agencies and descended into self-service customers.

What is changing?

Eyeballs may not be looking across at content as much as their intent becomes clear. They may not be searching on Google, or scrolling across lots of different possible products and ads placed next to them, or even going to social for their dopamine hits.

People may be turning to their AI bots to make decisions for them for price, reviews, and final purchase; they may go to AI bots for interactions, information once only sourced through the wisdom of crowds; the weak-ties communities in forums and on social media may still be able to hold onto its share of eyeballs till an AI bot is able to fulfill this tribal, information and social need better.

But how will these different AIs be able to do this?

By scraping data; and here is where things get interesting in terms of the protocol.

The original internet was an "open web" of information that could link to one another; this fostered free access to information from anywhere in the world, ungated.

But if AI is doing the scraping and connecting the dots for us, and they become the primary gateway, what does this mean?

We the consumer pay the AI for performing this service; we're no longer eyeballs exposed to advertising to pay.

In response, the content providers like Reddit and Stackoverflow, are charging AI to train on their valuable data. This is likely to continue across all media services, where the scraping API is billed for private access.

This change in pricing put many third party application out of business (and makes a case that social platforms with user generated content need to be decentralized with governance to make it sustainable for products built on top of them, rather than capturing all the value).

This is likely going to be an additional revenue stream with potentially expensive APIs which only these well-funded AI businesses will be able to use them.

However, charging enterprise pricing might not be the right move.

The users of the protocol could very well be consumer by crawlers and AI agents, many of these personalized, even based on open source foundation models.

Every household could potentially shift to a world where they interact through their agent which surfs the web on their behalf and households give their agents a small "budget" to pay on their behalf.

The process of a "developer" signing up for the API access will not be the norm.

Instead, a "402 handshake" is likely going to be developer for these smaller API transactions via a micro-payment crypto wallet.

In the beginning, this will be point to point: the agent goes to N-number of web-sites and needs to pay for access to the data: review sites, publications behind paywalls, travel sites. As things mature, they may pay other agents that are more specialized.

The total cost to the customer for scheduling their vacation could be $5. It wouldn't be worth for the customer to pay for a monthly subscription to all the content providers.

This needs to be more automated.

At the same time, not every content provider is able to cut a deal with all the AI providers.

Small local directories, niche forums, perhaps even individual bloggers with large followings, and all the many news publications.

Cutting the deal on the long-tail isn't available to them (e.g. the Sacramento Bee versus the NY Times).

But it's not valuable for them to keep providing it for free...the advertising and subscription revenue isn't going to support it.

How do we enable a payment layer on a protocol originally designed for free transfer of information?

What benefits everyone by freeing themselves from the tyranny of advertising, which is an out of protocol centralization of funds, available only to the largest entities with the broadest scope?

It should be part of the existing free protocol.

The typical "paywall" and "developer API subscriptions" are out of protocol. It's basically a billing service built into authentication.

What this should be is a specification for the 402 payment which any server or a specialized gateway can offer. It could even be a special free application that can run in front of a global reverse proxy like Cloudflare.

What does it do?

Whenever an agent scrapes the website or directly accesses the GET APIs of the website, the server responds with a 402 and provides the payment terms. This includes the terms of service, the costs of the data, a usage end-point for billing, and a smart contract to fund this.

The agent (or more likely the LLM scrapers), take that payload and, on their end, have a payment service which assigns a budget, funds the wallet, and begins to deposit into the smart contract. That service does what a human would do: extract the data, check the usage, check the balance, top off as needed.

The reason why this is important is that the long-tail of content providers can set it up once, and just put a 402 "shield" around their content to all the content scrapers who would otherwise be rate limited.

As things advance, there might need to be some "registry" or "white list" of legitimate LLMs that should have access; or a way to legitimate scrapers....there are small businesses that want to scrape the web for things like leads; and yeah, what was once free and co-opted by other businesses that became lead generators, is now more freely available. The costs are shifting towards the people who need the data (a larger pool) versus being farmed for free by these lead farms who then, in turn, charge the individual businesses.

What if the content providers have difficulty discerning, however, who is an agent and are being spoofed to look like a real browser. Yes, there are technologies that fingerprint the behavior; but an alternative might be to just charge everyone.

Prices potentially go up for those who consume high volume, because those are clearly scrapers for LLMs and the like.

Each browser for a user, then, would also have a handshake, similar to the SSL handshake, but it's wallet to smart contract address; browsing a site may cost fractions of a sent for a casual reader (as it should). And these in theory should be such that it prevents the problem of false positives and the whack a mole problem with rogue scrapers.

Everyone is a scraper; everyone is an LLM; you just pay in protocol.

If you're a human, you should be orders of magnitude less than anyone else and ideally its almost nominal.

This, of course, might mean in the short term content providers don't make as much as they would charging individual subscribers their monthly rate; but they may get alot more customers paying pennies per month. After all, advertising is based on a CPM basis; they are just a more efficient aggregator.

Eventually, this pay to play protocol at the micro level can reduce, say, certain types of behavior on comments and social media, which are inhabited by bots.

Eventually, bots will not find it productive to pay to do this.

But the protocol can be a very lightweight; open protocol and standard; the wallets and addresses are already an open standard.

The only other aspect that needs to be supported as the micro payment transactions themselves, and that can easily be USDC.

What is the way to validate this assumption?

The first is primarily to talk to content providers who fit this profile.

We have already seen companies like Reddit and Stackoverflow cut deals; and they do have developer API's.

We can also come up with a top list of second tier UGC based companies to ask what their thoughts are on how their content becomes a revenue source.

Some may say they keep it closed because their advantage will be AI.

The only ones able to truly do this would be big: Facebook, Amazon -- they could close down their sites from any kind of scraping and monopolize their valuable data.

But these second tiers need to advocate for more open services and system and welcome them into their ecosystem with more of an in-protocol payment service; they may think that coming up with developer payment plans will work because of the types of businesses they see, but the friction will eventually get too difficult; they will leave money on the table by doing so.

Once some of that set up that are considered valuable (Wikipedia would be a great use case); to then go to the friendlier LLM scraping companies, and perhaps to look at open source models that can train (not just provide inference based on pre-built foundation models) and see if there's an MVP protocol for payments on their end.

Then just start.

The growth engine would be on the supply side: the scrapers will likely just want to scrape for free till they get caught.

It would be great if the scrapers advocated this approach for paywalled businesses and became our growth arm for the protocol.

On the publication side, perhaps the best way would be to make speeches and write blogs at associations, like the local newspaper association, the newspaper association, the next blogger convention; and to talk about the open protocol standard that they can implement in less than one day.

The product side might be a straightforward server-side reference architecture for how their server should respond with the payload; how to gate; API design (for example, broadcast the "come scrape me" API in the responses).

And deploy the smart contract to accept payment, and issue events to accept and deny the front end (perhaps leverage the unlock protocol, at least in design if not in practice).