What’s with the hostility towards personal publishing?

18 Jun 2024 — 4 min read

It's about money, of course.

Google may have a problem with us.

A few weeks ago, Google had a big leak. While details are still lacking, it seems like there’s a bias against personal sites when it comes to search results.

From Mike King:

“Google may be torching small sites on purpose.
Google has a specific flag that indicates a site is a ‘small personal site.’”

LLMs have issues with us too.

OpenAI (Microsoft), Google, and now Apple all admit to scraping from the “open web” to train their models.

Under the traditional definition of “training,” where a model simply learns from a data set, small publishers probably wouldn’t have an issue. However, in practice, this training looks an awful lot like: a little learning with healthy dose of copy-and-paste. In other words, a fancy database.

Knowing this could be an issue, these companies have started making deals with the biggest content owners, paying off stock image sites and media conglomerates for the use of their content in training models.

Wait. What?

If it’s just LLM training as traditionally defined, why would they need to pay anything? Because we know it’s not true. We’ve seen stock image watermarks appear on “AI” content in the past. There is some form of (I would argue illegal) content scraping going on here.

Simply put, they’ve created billion-dollar models based on content that was used in a very-legally-questionable way.

And if they’re paying, why are they only paying the biggest players? Why are there no accommodations for the millions of blogs, newsletters, podcasts, and other forms of content from independent media creators they’ve used to train their models?

I just heard an argument on a podcast from a tech bro that it’s because there’s no way to make a good-enough-for-the-public LLM without scraping the entire web.

How is that our problem?

I’ve always had a copyright notice on my books and blog posts. I do this because I don’t want my content used inappropriately in a place where my readers might be taken advantage of. I lose this battle 99% of the time when (mostly Russian sites) scan my books the day after release and offer them as malware-laden PDFs, but I make the effort to protect them as best I can.

If these models only learned from my books and posts, I don’t think I’d have an issue with them. It’s clear they’re doing more, though. I think they’re all hoping to be too big to fail by the time class-action lawsuits around this come to court.

Once again, personal publishers take the hit.

Gmail and Apple have more issues with us.

Gmail has been controlling your inbox for a long time by separating anything it deems distracting from their ads. This includes email newsletters – especially those from smaller creators who don’t have deliverability specialists on staff.

Last week, Apple announced they were also getting into that game, by sorting your email for you, and taking it a step further by replacing the preheaders (often carefully considered introductions to newsletters) with AI-generated preheaders.

Trust is everything in online publishing. Google and Apple want to apply a layer of trust they define to supersede any trust you define.

I want to be optimistic about this, but Google did this kind of thing years ago with trying to kill off RSS – possibly the greatest tool for online consumption created since the hyperlink. It worked and now only nerds and some podcast listeners use RSS.

I suspect this change for email will stick too. Smaller, personal publishers will be hit hardest, as usual.

We are doing this all backwards.

As generative AI fills our feeds with regurgitated mush, our innate trust in individuals over brands will determine the winners of both attention and revenue. Everyone in media should be racing to become a trusted individual right now.

The biggest tech companies may be aware of this threat. They do hire a lot of psychologists.

In Google’s case, attention to individuals means less revenue (see what happened to RSS vs. Google News and email vs. Gmail ads). So, it makes sense that their decisions are counter to our interests. But I see it in mid-size companies too.

Have you ever noticed how every new newsletter or blogging platform asks you for your publication’s name by default when setting up an account? That publication name is front-and-center for every interaction with your audience.

I’m old enough to remember when your name was assumed to be your name.

Why is everyone playing the game that benefits the giant media companies over themselves? It’s perplexing that we default to the less scarce, less valuable, publication name versus the creator’s name.

Imagine if Stephen King decided to title his website, newsletter, podcast, and YouTube channel, “IT.” Then, imagine he tried to sell the rest of his books and movies through that brand, rather than his name.

That’s what we’re doing as creators online. That’s what content platforms are engineered for by default. It’s short-sighted.

The personal website and the personal newsletter have always been more interesting to me. They’re also more trustworthy over time. I see personal publishing as only increasing in value to the consumer the further we get into the AI era.

You should probably establish that connection with an audience now, under your name. Soon, it may be impossible to tell if the person behind your favorite content is a person at all. That’s when what’s cheap and easy about establishing trust now becomes expensive and difficult through in-person publishing (speaking and performing may take off again).

We all know personal publishing is swimming against the current. But it always has. Our strength is in our flexibility and authenticity. I don’t see that changing. This post remains under my name.