Skip to main content


New #blog: Autodetecting and Announcing #Mastodon Scrapers and Crawlers

There've been quite a few #fedisearch issues recently, but the common thread is that there's usually a gap in reporting - they're often live for weeks before people are made aware.

It's not just people's pet projects either, there are other #scrapers active, quietly consuming posts

So, I built a bot to detect and out them so that fedi admins can block as necessary

https://www.bentasker.co.uk/posts/blog/security/autodetecting-and-outing-mastodon-scrapers-with-scrapersnitchbot.html

#infosec #security
Unknown parent

crepererum
@rastilin @antijingoist then why not have a opt-out solution or ask people when creating an account (aka "must opt").
Unknown parent

rastilin
@antijingoist @crepererum

The problem with opt-in indexing is that almost no one will opt in, you'll have to cajole people one at a time to get even 1% indexing. An index that stores 1% of content is not worth using, so any search project that's opt in only is basically dead out of the gate.
Unknown parent

Ben Tasker
@crepererum @antijingoist @rastilin I agree, client-side indexing doesn't work as a "general" search solution.

If the aim is to search your own toots (the only ones you can reasonably consent to being included) it'd work (to a point), but probably wouldn't bring much value
Unknown parent

crepererum
@antijingoist @rastilin Just as a side note: client-side indexing is not scalable. You cannot expect a phone app to download the actual tweets of all mastodon instances (or even the opt-in ones). The mastodon instances themselves could index their tweets and offer federated search. I don't see why this wouldn't be possible.
Unknown parent

crepererum
@kensanata @rastilin @antijingoist The "ask on sign-up" solution isn't a trick, both opt-in and opt-out can be are because they assume that we know what the user wants without asking them (solid evidence could like a study could avoid that). How many people will opt-in and if that's enough to be a valuable feature (also considering who opts in, especially private accounts VS official accounts and public figures) is IMHO not clear yet. My assumption is that there are enough opt-ins.
in reply to Ben Tasker

This is a great essay (and tool). I wanted to highlight one passage:

"All that's really necessary [to make an opt-in search index] is to create an ActivityPub instance and have users follow a specific account in order to opt-in. It even simplifies the system's architecture, because indexable content will get sent straight to the index instance, with no spiders required."

While you say this has been talked about before, *I* hadn't seen it so clearly expressed, so thanks!
#Fediverse #search

Lo, thar be cookies on this site to keep track of your login. By clicking 'okay', you are CONSENTING to this.