New #blog: Autodetecting and Announcing #Mastodon Scrapers and Crawlers
There've been quite a few #fedisearch issues recently, but the common thread is that there's usually a gap in reporting - they're often live for weeks before people are made aware.
It's not just people's pet projects either, there are other #scrapers active, quietly consuming posts
So, I built a bot to detect and out them so that fedi admins can block as necessary
https://www.bentasker.co.uk/posts/blog/security/autodetecting-and-outing-mastodon-scrapers-with-scrapersnitchbot.html
#infosec #security
There've been quite a few #fedisearch issues recently, but the common thread is that there's usually a gap in reporting - they're often live for weeks before people are made aware.
It's not just people's pet projects either, there are other #scrapers active, quietly consuming posts
So, I built a bot to detect and out them so that fedi admins can block as necessary
https://www.bentasker.co.uk/posts/blog/security/autodetecting-and-outing-mastodon-scrapers-with-scrapersnitchbot.html
#infosec #security
Creating A Log-Analysis System To Autodetect and Announce Mastodon Scr
I decided to build a scraper bot detection system to run against my mastodon instance, it uses behavioural scoring to fimd scrapers and then toots details to help other instance admins protect their uwww.bentasker.co.uk
crepererum
Unknown parent • • •rastilin
Unknown parent • • •The problem with opt-in indexing is that almost no one will opt in, you'll have to cajole people one at a time to get even 1% indexing. An index that stores 1% of content is not worth using, so any search project that's opt in only is basically dead out of the gate.
Ben Tasker
Unknown parent • • •If the aim is to search your own toots (the only ones you can reasonably consent to being included) it'd work (to a point), but probably wouldn't bring much value
crepererum
Unknown parent • • •crepererum
Unknown parent • • •Jay
in reply to Ben Tasker • • •"All that's really necessary [to make an opt-in search index] is to create an ActivityPub instance and have users follow a specific account in order to opt-in. It even simplifies the system's architecture, because indexable content will get sent straight to the index instance, with no spiders required."
While you say this has been talked about before, *I* hadn't seen it so clearly expressed, so thanks!
#Fediverse #search