Loomio
Sun 16 Feb 2014 7:26PM

Statistics and privacy

MB Manuel Bichler Public Seen by 74

The opt-in statistics.json feature provides real-time sums of the number of users and posts on the D* pod. This might be a privacy issue, especially on small pods.

Is this considered a problem? Should we change the statistics implementation?

S

StarBlessed Sun 16 Feb 2014 8:41PM

And you would be right. I am paranoid. I wont connect my pod to FB or Twitter for that very reason. I'm getting close to pulling it away from Tumblr.
If I had my way, there would be no public data about any kind of D* statistics. But that's just me.

MB

Manuel Bichler Sun 16 Feb 2014 8:52PM

@starblessed Well, the nice thing about a decentralized network is that every pod has their own philosophy - some will provide statistics and some won't, and it's your very personal decision which one you prefer to open an account on.

I think we should make it easily possible for podmins without any programming experience to choose their own statistics philosophy, maybe even provide more options than to opt-in or not to opt-in.

Btw. if you really want to be on a pod that does not provide any statistics whatsoever, the podmin must assure that he/she does not even say "well, about a thousand" when being called by media and asked how many accounts he/she serves.

S

StarBlessed Sun 16 Feb 2014 8:55PM

I opted into the stats. Just for now. I want to see how it could possibly affect the value of the data.

L

lnxwalt Sun 16 Feb 2014 9:08PM

In the linked discussion, we learn that there are two separate issues: each pod's statistics.json file and the central stats collector's polling. If Diaspora has no concept of regularly scheduled tasks, this change could require a fairly extensive rewrite.

I'm going to abstain because I do not think there is enough of a privacy benefit to justify the extra work this asks Jason to do (rewriting the stats collection process).

MB

Manuel Bichler Sun 16 Feb 2014 9:25PM

@flaburgan you are referring to Jason's statistics hub that pulls every day, but the data itself is pullable in real-time. Just like Jason did, I could write a bot that pulls the pods' data every second instead of every day. This topic is not about any pulling bot, it's about Diaspora's statistics.json feature.

Stats about non-anonymous data are never "completely anonymous", ask @starblessed about that. ;)

@lnxwalt If the community decides that something has to be done, I could do the programming stuff. No need to burden Jason.

F

Flaburgan Sun 16 Feb 2014 9:28PM

Well, in that case, I think that the statistics.json can be updated every day, it looks precise enough for statistics, and long enough to not know when exactly someone registered (but seriously, knowing the massive amount of data online, what's the problem by knowing when "someone" is registering? Believe me, I'm really engage for online privacy, but there I don't get it...)

MB

Manuel Bichler Sun 16 Feb 2014 9:53PM

I know, this seems somewhat paranoid and the D* project probably has many other issues that are much more important than this one.

I just want to raise awareness that real-time sums may lead to data leaking situations on relatively small pods that can only be avoided when opting out of statistics at all. Even on relatively bigger pods, a real-time trend on the number of posts feels somewhat spooky.

We might also only publish the numbers in 100s instead of the exact numbers.

A

Adrenalin Sun 16 Feb 2014 10:29PM

My first thought -- dick swinging

I do not understand at all why and who needs to have these statistics? Are they used to solve concrete problems?

For privacy reasons I would prefer not to analyze anything. Everything else is an invitation for data mining.

MB

Manuel Bichler Sun 16 Feb 2014 10:36PM

@adrenalin the statistics feature is already implemented, see https://www.loomio.org/d/FBjn89X2/central-hub?proposal=1y7tgbVP but it is an opt-in feature, so a fresh installation of D* does not publish any statistics whatsoever.

F

Flaburgan Sun 16 Feb 2014 10:46PM

@adrenalin the statistics are really needed, because we need credibility. To have more people coming in diaspora, more journalist and projects talking about us, more developers helping us to build a nice software. Most of the people who doesn't follow the project just think that "diaspora is dead". We have to fight this idea and we need to show numbers for that.

Load More