I’ve been meaning to disconnect from Jetpack for a while now. This seems like a good time to do it, and to finally clear out the older Tumblr and WordPress.com blogs I don’t use anymore.

Tumblr and WordPress to Sell Users’ Data to Train AI Tools404 Media

It’s the kind of thing that you expect from Google or Facebook, or from any number of start-ups, but there’s been this sense that Automattic should know better — and with Tumblr being login-walled and ad-saturated, and the push to upsell in their WordPress plugins, and now this…it’s looking like they don’t.

I don’t think they’ve hit the “trust thermocline” yet, but selling user data is a pretty clear line.

As for AI access to the Firehose: My previous understanding of the firehose is that it’s basically an aggregation of what you’d see in a bunch of blogs’ public RSS feeds. Which, OK, fine. Analyze your heart out. Display my posts in your RSS reader. Just make sure private posts and comments don’t leak.

But LLM training isn’t the same as analytics, or showing a properly attributed post in a reader. And quietly changing the terms to allow more kinds of re-use on something most people using the service don’t know about? Not cool.

And not making it clear what is and isn’t included for which purposes? That breaks down trust.

Before this, I wasn’t worried about the Firehose. But now I’m not sure I can trust Akismet, never mind Jetpack, and I’m looking for a new spam filter.

Originally posted across several threads through my GoToSocial test site.

Update: Automattic did clarify that self-hosted blogs with Jetpack are not included in the training data. Only company-hosted blogs on Tumblr and WordPress.com. But I still uninstalled Jetpack from this site, just to be sure. Like I said, I’d been meaning to for a while.

Since I started converting parts of my website to use 11ty as a static site generator, I’ve been able to automatically generate tag and category pages that are *just there* as plain html files. And since they’re plain HTML, the old local site search engine I have on there still finds all the Eleventy-generated pages. And again since it’s all static, it doesn’t go down when the database does (which has been happening on an annoyingly frequent basis lately).

And this would be perfect if I was using a single Eleventy instance to build the entire site, but I’m not. I’ve got separate instances building the Les Misérables blog, the reviews, the tech tips, the creative writing collection, and so on, plus I have this WordPress blog and a bunch of hand-coded HTML from the old days.

Which leads to a few problems:

  1. Tags are per-section, not universal.
  2. The site search, which indexes html files on the server, sees everything except the WordPress posts, and the WordPress search *only* sees the WordPress posts.

Some ideas I’ve had to combine the tag pages:

  • Rebuild everything in a single Eleventy instance with a deeper hierarchy. Upside: Still static pages for everything except WordPress. Downside: Time-consuming, still leaves the main blog separate.
  • Write a post-build script that combines all the the tag pages from each subsite. Upside: Same. Downside: Need to either run on the server or make sure my local copies of the *other* subsites are current.
  • Write a server-side page that combines the backend HTML pages into a dynamic frontend for only the tag being viewed. Upside: simple. Downside: tag pages now depend on PHP.
  • Write some client-side JavaScript for the tag pages that will check whether other subsites have tag pages, and add those to the end of the list in a “See also…” section. Upside: simple, and the “local” tag pages are still usable as long as I make sure the script doesn’t block anything. I could even have it check the other static subsites first and then check the blog, so if the blog times out I still display everything else. Downside: requires JavaScript and additional network requests. But as long as I stick to vanilla JS, I can make it pretty small.

And for unifying the search:

  • Write a post-site-indexing script that adds the WordPress posts to the index. Could be done with direct DB access.
  • Write a pre-site-indexing script that generates a bunch of files for it to index. Seems like overkill.
  • Update the search code to send the same search terms to WordPress and combine the results.
  • Use a new search engine that indexes the served pages instead of the files on the server.
  • Point the search box at a remote search engine like Googl…yeah, never mind.

I haven’t settled on anything. I’m just kind of writing down ideas in public. If you have any suggestions, please let me know!

WP Tavern summarizes the conversation around WordPress losing CMS marketshare for the first time in ages, and what various people have cited as likely causes.

Personally, I’m finding its increasing complexity to be a major frustration.

  • Writing on WordPress has gotten somewhat more complicated.
  • Maintaining a WordPress site has gotten more complicated.
  • Developing for WordPress has gotten more complicated.
  • The resulting page code (including CSS and Javascript) has gotten a lot more complicated. As I’ve noted before, there’s no good reason to require 450K of data to display a 500-word post. Or a single link with a one-sentence comment.

The move towards Gutenberg blocks and full-site editing complicates things on several levels, and feels like an attempt at lock-in as well.

Ironically, I’ve been moving toward Eleventy, which has also been very frustrating…but only in building the layout I want.

On one hand…

  • I have to develop a lot of the components I want from scratch. More than would have thought. Though I suspect there are enough pre-built layouts out there for most people’s use cases.
  • The documentation is sorely lacking. (Eventually I’ll get around to helping with that.)
  • Dynamic features like comments need to be handled by another program.

But on the other…

  • I can fine tune things a lot more easily than fine tuning a WordPress theme.
  • Once I’m done building the layout, adding a new post is almost as easy as it is on WordPress.
  • My actual post content is portable.
  • There’s essentially no attack surface, so if I have a site that’s “done” I can just build it one last time and leave it as-is — and not worry about spam, maintenance or security (beyond general webserver security).
  • I don’t have to send extra JavaScript libraries along with every page, so it can use a tenth of the bandwidth and load faster on slow connections.

With Eleventy, setting up the layout and features has been super complicated…but once it’s set up, it’s smooth, easy to deal with, and does the job well. It’s kind of like running Linux back in the 1990s.

But with WordPress, there’s complexity in every layer.

Sometimes it’s worth it.

Sometimes it’s not.

Wow. Automattic bought Tumblr from Verizon for less than $3 million. Considering Yahoo bought it for $1.1 billion back in the day…

Yahoo really squandered it. And Verizon, I think, just wanted to get rid of it.

At least it’s going to an actual social media company not to another conglomerate. And one that’s more responsible than the big two! I was half expecting Verizon to try to monetize it into the ground and close it once everyone but the die-hard users had given up on it. But they found a blogging company for Tumblr, just like they found a photography company for Flickr. That’s encouraging. And Matt Mullenweg (who turns out to be a long-term Tumblr user as well!) understands that Tumblr and WordPress are different types of experiences, so they’re unlikely to try to merge them into a single service.

Though apparently they’d like to move the back-end to WordPress, while keeping the front-end experience of the Tumblr site and apps. I can sort of see the appeal: they’ve got over a decade of experience making WordPress scale, and they have to migrate Tumblr off of Verizon’s servers anyway. If they can run Tumblr on top of the WordPress infrastructure, it’s just a matter of adding capacity.

But it kind of runs the risk of creating a frankenblog. I guess it depends on how seamless the conversion is. If Tumblr looks and works the same from the user-facing perspective, it shouldn’t drive anyone away. If they try to turn it into a subset of WordPress.com…I’d expect another exodus.

Speaking of which, I doubt they’ll get anyone returning who left directly due to the adult content ban. Especially since they don’t plan on reversing it. But they might get back at least some people who left because they saw the ban as a sign of a dying platform. And they might be able to bring in new users, who knows? Having corporate overlords who actually understand and appreciate the space could be a big help.

Though frankly, even if all they do is keep it running in maintenance mode for those who are still there, that’s still better it would have been staying at Verizon!

As for me, I haven’t been active on Tumblr for a while. I took a final archive after cleaning up a bunch of old stuff, imported some posts here, and I’ve checked in to read maybe…once a month? I’m still in wait-and-see mode. We’ll see how the data migration goes, what they end up doing with the terms of service, whether they change the way ads and promoted posts appear.

But I am more confident that Tumblr will still exist next year than I was a few months ago!

Over at Key Smash!, I’ve been helping beta-test the Pterotype plugin to hook up a self-hosted WordPress to the Fediverse. It gives WordPress an ActivityPub presence, so new posts and comments can be seen in Mastodon, Pleroma, and other ActivityPub-powered networks, and replies from those networks can come back as comments.

But Key Smash! is a simple test case. It’s at the top of the site, there’s no caching, it’s only got a handful of posts, and it hasn’t been bombarded by spammers for years.

So I’ve installed it on here. Older posts won’t federate, but new ones (starting here) should, and replies should show up as comments. With luck they’ll land in the moderation queue instead of the spam queue.

You may be able to follow the site by searching for this post’s URL in Mastodon/etc. Maybe. I need to report a bug in the handling of sites that aren’t at the top level: To find the site I need to search for @blog@www.hyperborea.org/journal – the first time. Then that search stops working, but I can find it at @blog@www.hyperborea.orgjournal instead. But that only works after I’ve searched for the first one.

Well, that’s part of why I set it up here: to help beta test.

Update: Submitted the username/discovery issue to Github.

Update: You can now follow the blog directly at @blog@www.hyperborea.org

Update (Dec): I turned it off temporarily due to spam problems. Spam comments were visible through ActivityPub, and couldn’t be deleted due to a FK constraint on the Pterotype tables.

Update (2019): Pterotype appears to have been abandoned. 🙁