Mastodon scraping incident
You might have seen this stuff about the fediverse being scraped for some research (see https://sunbeam.city/@puffinus_puffinus/103473239171088670), which included social.coop.
I've seen quite a range of opinions on the topic, some consider the content public, others feel their privacy has been violated.
There are a few measures that could be put into place to prevent/limit this kind of thing in the future (see https://todon.nl/@jeroenpraat/103476105410045435 and the thread around it), specifically:
change robots.txt (in some way, not sure precisely, needs research)
explicitly prevent it in the terms of service (e.g. https://scholar.social/terms)
disable "Allow unauthenticated access to public timeline" in settings (once we are on v3 or higher)
Thoughts?
Poll Created Wed 15 Jan 2020 5:33PM
Enable authorized fetches, disable public access via API (Once we are on v3+) Closed Mon 20 Jan 2020 4:03AM
This discussion is great and shows we as a Coop don't quite have consensus on the issue.
Let's see if we can agree on a few things.
Results
Results | Option | Voters | |||
---|---|---|---|---|---|
|
No | 1 | |||
|
Yes | 7 | |||
|
Undecided | 90 |
8 of 98 people have participated (8%)
mike_hales
Wed 15 Jan 2020 6:31PM
I don't fully understand the statement (eg 'authorized fetches') and there's no <abstain> option, so must vote a NO.
Nathan Schneider
Wed 15 Jan 2020 7:18PM
Given the strong concerns raised here, I would be okay with this.
Nick Sellen
Thu 16 Jan 2020 10:54AM
I think we need a more informed discussion about what it is first.
mike_hales
Thu 16 Jan 2020 1:18PM
I don't fully understand the statement (eg 'authorized fetches') and there's no <abstain> option, so must vote a NO. Later: Nick provided some detail and I'm happy to vote YES now. Thanku.
mike_hales
Thu 16 Jan 2020 1:18PM
I don't fully understand the statement (eg 'authorized fetches') and there's no <abstain> option, so must vote a NO. Later: Nick provided some detail and I'm happy to vote YES now. Thanku.
Wooster Wed 22 Jan 2020 8:01AM
How would any of this actually prevent scraping?
Noah Wed 15 Jan 2020 6:03PM
Privacy of toots is a complicated question. Obviously "public" means "not private" but it does not adequately distinguish between "public as in what I say in my yard, or at a restaurant" and "public as in what an elected official says at a meeting." I do believe it should be more difficult to scrape the public timeline; it's not something required for regular, individual-level interaction and almost never done with intentions of directly benefiting the people whose data is being scraped. And although it's minimal let's not forget we're paying for the server resources consumed by the scraping!
I'm in favor of all three of the options given by @Nick Sellen , and honestly interested in going further. For example, I hope for us to someday have a discussion on the possibility of migrating from vanilla Mastodon to a compatible fork offering a local-only post privacy option (I know of Hometown and glitch-soc, there may be others).
Django Wed 15 Jan 2020 6:10PM
Oh yes regarding "change robots.txt (in some way, not sure precisely, needs research)"
I believe there is an Admin setting which turns this on by default for all users, but it is a user setting to opt of Search engine indexing
Django · Wed 15 Jan 2020 5:42PM
I would like to create a second Poll regarding a Change to the Terms of Service, but fear it might be confusing having multiple polls at once
Here are the choices I have in mind, not sure if It should be a ranked choice or only choose one out of the 4 options
Explicitly prevent scraping 1
Researchers must explicitly ask for access 2, 3
Researchers are not obliged to ask for access 3
Status Quo
This would also require some software to detect and ban IPs attempting to scrape
This would also require software, and an temporary exception would be made
Users who have Checked off 'Opt-out of search engine indexing' would be automatically excluded from research.
Thoughts?