Loomio
Sun 16 Feb 2014 7:26PM

Statistics and privacy

MB Manuel Bichler Public Seen by 74

The opt-in statistics.json feature provides real-time sums of the number of users and posts on the D* pod. This might be a privacy issue, especially on small pods.

Is this considered a problem? Should we change the statistics implementation?

JR

Jason Robinson Sun 16 Feb 2014 7:27PM

Also discussion in this post.

MB

Manuel Bichler Sun 16 Feb 2014 7:33PM

I'm really concerned about this feature regarding privacy, which is why I turned it off again to protect my pod's users.

One can easily track when exactly a new user or a new post appears, even though this data should be hidden imho.

For example, say if Alice promised me to register on pod X, but the user number of that pod didn't increase within the last week, I know that Alice didn't register. Or if the user number of pod X increased by only 1, I know exactly at what time Alice registered. The same is true for posts.

This is a common problem within statistical databases (referring to Stallings).

I share the opinion that there should be a way to make good estimates about the number of people/accounts using Diaspora and obviosly the statistics.json method is a good way to provide those figures. But especially for very small pods (as mine) I consider it a privacy concern if they provide realtime sums for posts. Although small pods are clearly less important for generating network-wide estimates as big pods like geraspora.de or diasp.org, since Diaspora is a decentralized network, there will always be small pods, and, who knows, maybe in the future small pods (<100 accounts) will make up the majority of accounts (although the Pareto principle is more realistic imho - https://en.wikipedia.org/wiki/Pareto_principle).

I'm really keen on providing stats on my pod, but - sorry - not in real time (also not when my pod grows). I cannot provide a service to my users in clear conscience as long as it discloses those real-time sums.

MB

Poll Created Sun 16 Feb 2014 7:38PM

Provide weekly snapshots instead of real-time data Closed Tue 18 Feb 2014 2:34PM

Outcome
by Manuel Bichler Tue 25 Apr 2017 5:52AM

Only 1 pro in 12. Proposal declined.

The statistics.json info (the fields total_users, active_users_halfyear, active_users_monthly and local_posts) should not contain the current sums, but those of the last Tuesday midnight GMT instead. This makes sure that the numbers are not real-time but weekly snapshots.

Results

Results Option % of points Voters
Agree 8.3% 1 MB
Abstain 33.3% 4 N S L DU
Disagree 50.0% 6 JH JR F PG S DB
Block 8.3% 1 MP
Undecided 0% 257 BK ST FS MS TS AA S CB HF BO DM GC JH RF M EG G AX PC PP

12 of 269 people have participated (4%)

DU

Itai
Abstain
Sun 16 Feb 2014 8:43PM

d* statistics are a good idea, and privacy isn't compromised currently IMHO (maybe just barely) since the stats are cumulative. But if it encourages more podmins to opt in we can make it weekly, it won't hurt functionality much. podmins should vote

L

lnxwalt
Abstain
Sun 16 Feb 2014 9:09PM

This asks Jason to do a lot of extra work for a very minimal privacy benefit.

F

Flaburgan
Disagree
Sun 16 Feb 2014 9:12PM

I don't see which real problem could appear with the data pulled every day. Those stats are completely anonymous, knowing the number of person registering on a day is not a privacy leak.
Even if it's 1, it gives no real information on who registered

JH

Jonne Haß
Disagree
Sun 16 Feb 2014 11:56PM

This is an abstain but the proposal is overly specific for this case. Possible privacy leaks depend not on interval, but on interval in relation to pod size. If we do cached statistics, the interval should be configurable.

JR

Jason Robinson
Disagree
Tue 18 Feb 2014 7:33AM

Just in time to vote! :D

S

StarBlessed Sun 16 Feb 2014 7:56PM

This seems arbitrarily paranoid. While I understand both sides of the story, I will not vote either way on the subject. I don't agree with either side. Honestly, I fought against any statistics in the first place. This, however, is just trying to stuff the genie back into the bottle.

MB

Manuel Bichler Sun 16 Feb 2014 8:35PM

@starblessed Based on what I read in your other posts, you consider a pod's providing statistics about the number of accounts and posts a step into centralizing the network and a step against privacy as such.

I disagree on the decentralization point but I understand your privacy point. You can't have full statistics and full privacy at the same time, those two things are mutually exclusive, just like Heisenberg's uncertanity principle. We have to balance between statistics and privacy - and for proposing dropping all statistics because of privacy issues, one may call you paranoid. ;)

Load More