The Firehose and the Jetpack

Posted on March 1, 2024

I’ve been meaning to disconnect from Jetpack for a while now. This seems like a good time to do it, and to finally clear out the older Tumblr and WordPress.com blogs I don’t use anymore.

Tumblr and WordPress to Sell Users’ Data to Train AI Tools — 404 Media

It’s the kind of thing that you expect from Google or Facebook, or from any number of start-ups, but there’s been this sense that Automattic should know better — and with Tumblr being login-walled and ad-saturated, and the push to upsell in their WordPress plugins, and now this…it’s looking like they don’t.

I don’t think they’ve hit the “trust thermocline” yet, but selling user data is a pretty clear line.

As for AI access to the Firehose: My previous understanding of the firehose is that it’s basically an aggregation of what you’d see in a bunch of blogs’ public RSS feeds. Which, OK, fine. Analyze your heart out. Display my posts in your RSS reader. Just make sure private posts and comments don’t leak.

But LLM training isn’t the same as analytics, or showing a properly attributed post in a reader. And quietly changing the terms to allow more kinds of re-use on something most people using the service don’t know about? Not cool.

And not making it clear what is and isn’t included for which purposes? That breaks down trust.

Before this, I wasn’t worried about the Firehose. But now I’m not sure I can trust Akismet, never mind Jetpack, and I’m looking for a new spam filter.

Originally posted across several threads through my GoToSocial test site.

Update: Automattic did clarify that self-hosted blogs with Jetpack are not included in the training data. Only company-hosted blogs on Tumblr and WordPress.com. But I still uninstalled Jetpack from this site, just to be sure. Like I said, I’d been meaning to for a while.

Ideas for Unifying a Fragmented Website

Posted on February 6, 2024

Kelson

Since I started converting parts of my website to use 11ty as a static site generator, I’ve been able to automatically generate tag and category pages that are *just there* as plain html files. And since they’re plain HTML, the old local site search engine I have on there still finds all the Eleventy-generated pages. And again since it’s all static, it doesn’t go down when the database does (which has been happening on an annoyingly frequent basis lately).

And this would be perfect if I was using a single Eleventy instance to build the entire site, but I’m not. I’ve got separate instances building the Les Misérables blog, the reviews, the tech tips, the creative writing collection, and so on, plus I have this WordPress blog and a bunch of hand-coded HTML from the old days.

Which leads to a few problems:

Tags are per-section, not universal.
The site search, which indexes html files on the server, sees everything except the WordPress posts, and the WordPress search *only* sees the WordPress posts.

Some ideas I’ve had to combine the tag pages:

Rebuild everything in a single Eleventy instance with a deeper hierarchy. Upside: Still static pages for everything except WordPress. Downside: Time-consuming, still leaves the main blog separate.
Write a post-build script that combines all the the tag pages from each subsite. Upside: Same. Downside: Need to either run on the server or make sure my local copies of the *other* subsites are current.
Write a server-side page that combines the backend HTML pages into a dynamic frontend for only the tag being viewed. Upside: simple. Downside: tag pages now depend on PHP.
Write some client-side JavaScript for the tag pages that will check whether other subsites have tag pages, and add those to the end of the list in a “See also…” section. Upside: simple, and the “local” tag pages are still usable as long as I make sure the script doesn’t block anything. I could even have it check the other static subsites first and then check the blog, so if the blog times out I still display everything else. Downside: requires JavaScript and additional network requests. But as long as I stick to vanilla JS, I can make it pretty small.

And for unifying the search:

Write a post-site-indexing script that adds the WordPress posts to the index. Could be done with direct DB access.
Write a pre-site-indexing script that generates a bunch of files for it to index. Seems like overkill.
Update the search code to send the same search terms to WordPress and combine the results.
Use a new search engine that indexes the served pages instead of the files on the server.
Point the search box at a remote search engine like Googl…yeah, never mind.

I haven’t settled on anything. I’m just kind of writing down ideas in public. If you have any suggestions, please let me know!

Thoughts on Tumblr’s Escape from Verizon to WordPress

Posted on August 16, 2019

Kelson

Wow. Automattic bought Tumblr from Verizon for less than $3 million. Considering Yahoo bought it for $1.1 billion back in the day…

Yahoo really squandered it. And Verizon, I think, just wanted to get rid of it.

At least it’s going to an actual social media company not to another conglomerate. And one that’s more responsible than the big two! I was half expecting Verizon to try to monetize it into the ground and close it once everyone but the die-hard users had given up on it. But they found a blogging company for Tumblr, just like they found a photography company for Flickr. That’s encouraging. And Matt Mullenweg (who turns out to be a long-term Tumblr user as well!) understands that Tumblr and WordPress are different types of experiences, so they’re unlikely to try to merge them into a single service.

Though apparently they’d like to move the back-end to WordPress, while keeping the front-end experience of the Tumblr site and apps. I can sort of see the appeal: they’ve got over a decade of experience making WordPress scale, and they have to migrate Tumblr off of Verizon’s servers anyway. If they can run Tumblr on top of the WordPress infrastructure, it’s just a matter of adding capacity.

But it kind of runs the risk of creating a frankenblog. I guess it depends on how seamless the conversion is. If Tumblr looks and works the same from the user-facing perspective, it shouldn’t drive anyone away. If they try to turn it into a subset of WordPress.com…I’d expect another exodus.

Speaking of which, I doubt they’ll get anyone returning who left directly due to the adult content ban. Especially since they don’t plan on reversing it. But they might get back at least some people who left because they saw the ban as a sign of a dying platform. And they might be able to bring in new users, who knows? Having corporate overlords who actually understand and appreciate the space could be a big help.

Though frankly, even if all they do is keep it running in maintenance mode for those who are still there, that’s still better it would have been staying at Verizon!

As for me, I haven’t been active on Tumblr for a while. I took a final archive after cleaning up a bunch of old stuff, imported some posts here, and I’ve checked in to read maybe…once a month? I’m still in wait-and-see mode. We’ll see how the data migration goes, what they end up doing with the terms of service, whether they change the way ads and promoted posts appear.

But I am more confident that Tumblr will still exist next year than I was a few months ago!

Sort of Blogging on the Fediverse

Posted on November 20, 2018

Kelson

Over at Key Smash!, I’ve been helping beta-test the Pterotype plugin to hook up a self-hosted WordPress to the Fediverse. It gives WordPress an ActivityPub presence, so new posts and comments can be seen in Mastodon, Pleroma, and other ActivityPub-powered networks, and replies from those networks can come back as comments.

But Key Smash! is a simple test case. It’s at the top of the site, there’s no caching, it’s only got a handful of posts, and it hasn’t been bombarded by spammers for years.

So I’ve installed it on here. Older posts won’t federate, but new ones (starting here) should, and replies should show up as comments. With luck they’ll land in the moderation queue instead of the spam queue.

You may be able to follow the site by searching for this post’s URL in Mastodon/etc. Maybe. I need to report a bug in the handling of sites that aren’t at the top level: To find the site I need to search for @blog@www.hyperborea.org/journal – the first time. Then that search stops working, but I can find it at @blog@www.hyperborea.orgjournal instead. But that only works after I’ve searched for the first one.

Well, that’s part of why I set it up here: to help beta test.

Update: Submitted the username/discovery issue to Github.

Update: You can now follow the blog directly at @blog@www.hyperborea.org

Update (Dec): I turned it off temporarily due to spam problems. Spam comments were visible through ActivityPub, and couldn’t be deleted due to a FK constraint on the Pterotype tables.

Update (2019): Pterotype appears to have been abandoned. 🙁

Possibly Out-There Federation Idea

Posted on November 9, 2018

Kelson

Now that Pixelfed federation and Pterotype are taking shape, I can hook up my photos and blogging directly into Mastodon and the Fediverse, but you know what would be even cooler?

Connecting them to each other.

A lot of my blog ideas grow out of photos or statuses that I’ve posted previously, as I find more to say or a better way to say it. And while it’s always possible to just post a comment or reply with a link, imagine posting them into the same federated thread.

Here’s a scenario we can do today:

Photo of something interesting on Pixelfed, boosted to Mastodon. I believe we’re one update away from Mastodon replies and Pixelfed comments appearing together.
Blog post on Plume or WordPress with Pterotype going into more detail about the photo. Comments and Mastodon/Pleroma replies can interleave right now. (Try it, if you want!)
Another photo on Pixelfed as a follow-up. Again, comments and replies can interleave.

This is already pretty cool, but it still creates three separate discussions. The best I can do is add a “Hey, I wrote more on my blog over here: <link>” to the first discussion.

What if there were a way to publish the blog entry as a reply to the PixelFed photo? Or to publish the second photo as a reply to the blog?

And that opens up other possibilities where people can reply to other people’s photos and blog entries with their own. (Webmentions sort of do this, but they’re not going to create a single federated discussion.)

I’m not sure what form this interleaved discussion would take, or what the pitfalls might be. (Visibility might suffer, for instance.) Blogging and photo posting tend to be platforms for an original post that can have comments, rather than platforms where a top-level post can be an OP or a reply, and this would change that model.

K-Squared Ramblings

Sci-fi, comics, humor, photos…it's all fair game.

Tag: WordPress

The Firehose and the Jetpack

Ideas for Unifying a Fragmented Website

Sort of Blogging on the Fediverse

Possibly Out-There Federation Idea