Diaspora Community - Federation Discussion

Sun 13 Oct 2013 2:27PM

Public post federation

Jason Robinson Public Seen by 50

The lack of public post federation in Diaspora is IMHO a make or break feature. The whole network is a little broken as small pods are cut of most of the posts on the network due to the way current federation works.

Here is my proposal for solving this issue, please see wiki post here.

It is not a comprehensive solution that can just be implemented now. It is a high level suggestion for going forward with talking about such a feature.

Flaburgan Mon 18 Nov 2013 4:09AM

I don't know enough to talk about the technical point, but I know something: I'm strongly opposed to anything which would involve Google services (even if it's public data): we saw how they turned of Google reader, or decide suddently to make Google Maps API a paying service. I don't want to depend of a company for a feature critic like this one.

Maciek Łoziński Tue 19 Nov 2013 8:21PM

There are many P2P networks and routing protocols out there, In my opinion we should go this path.
What if every pod was a relay for it's own users' followed tags?

Elm Wed 20 Nov 2013 3:53PM

I’ll prefer the distributed path even if some seeds/pods would loose a bit of not federated info. (Not sure to understand how it would work out though). @macieklozinski : could you precise how it would work for tag federation ?

Maciek Łoziński Wed 20 Nov 2013 9:43PM

I'll try to do some deeper research on possible p2p solutions.

Maciek Łoziński Wed 20 Nov 2013 10:53PM

I'm not sure if it fits well with Diaspora's protocols, but I could suggest something like this:

When user A shares with user B on another pod, user B's pod becomes "neighbor" of user A's pod.
User B's pod "subscribes" to user A's pod for all tags that user B's pod users follow.
Each pod keeps a list of it's neighbors and tags they subscribe.
When user on a certain pod makes a public post, it's sent to all neighbors subscribed for tags present on this post.
If a pod receives a public post from other pod and does not have this post in it's database, it passes it to all neighbors subscribing to tags present in this post, and saves post to database.
If a received post is already present in database, nothing happens.

goob Thu 28 Nov 2013 4:04PM

I think there's a big difference between having a central hub which contains information pertaining to the D* network but which is separate from the network itself (such as the project website, poduptime, etc), and a central hub which is an integral part of the D* network and receives/sends/stores data from that network, such as post data, which is what is being proposed here. With a central hub as an integral part of the network, the network would no longer be fully distributed.

If a central hub of any sort is actually needed in order for post/tag federation to work properly, I suggest it be restricted to holding meta-data, such as a list and IP addresses of pods or relay servers. This could be the same central hub which helped people to choose a pod to register at, as poduptime does at the moment.

It would then only be referred to when a new pod or relay server was brought online. The new pod would then call hub.diasporafoundation.org (for example), which would give it some pods/relay servers to contact from which it could pull post data. The actual transmitting of post data would be done by the pods/relay servers themselves, with no involvement from the central hub.

This is similar to one of the proposals I made in this discussion on adding pull to Diaspora's push model (the proposal concerning tags).

I'm not sure relay servers separate from the pods themselves would be needed; I think there is a way of making pods federate public data more effectively without using a separate network of relays, if they are connected correctly together.

Note that in the following, when I talk of connections/sharing between pods, I'm not talking about the normal connections between pods which exist, but a kind of meta-network to push public data around more effectively, of the kind Jason talks about in his proposal.

I would suggest using a kind of 'cell structure', in which each pod is connected directly with several other pods in the network, and through that structure build up a list of public posts and tagged posts data to pass on to other pods. This avoids the problem of scalability faced if 'every pod knows every pod'. If the relay connections between pods are made correctly, public data will be federated to every pod quickly, via indirect routes (Pod A shares it with the several pods to which it has direct connections; those pods share it with the pods with which they have direct connections; and so on). If there is redundancy built in to this network, it won't matter if several pods in this network are down; the data will get fed around to the whole network eventually in any case.

It might be that each pod needs to be connected only to two other pods in the network for this to work, like the classic Communist cell structure – as in the graphic below (not perfectly illustrated, but it gives you an idea):

Cell structure network

I'm sure there is a way of coding into the D* software itself so that it builds a network of connections such that each time a new pod is brought into the network, the network recalibrates its connections so that this new pod is made a part of the sharing network, without reference to any external source such as relay servers or a central hub. Likewise each time a pod drops out. However, I would have no idea how to do this! I hope someone out there will do, and that my partly developed concept will spark ideas for practical solutions in their mind.

If a central hub is needed to help new pods get connected, I think we should have a mirror or two on other servers just in case the project site is down when a pod is brought online.

Jason Robinson Thu 28 Nov 2013 6:11PM

@goob , will read the rest of you long comment later, but you should maybe read my proposal too ;)

a central hub which is an integral part of the D* network and receives/sends/stores data from that network, such as post data, which is what is being proposed here.

I have proposed no such thing. This is the reason I stopped the whole vote for the central hub because not many people even understood my proposal.

goob Thu 28 Nov 2013 6:47PM

My mistake. I did read you various proposals and wiki articles, but there’s been so much to read and digest that I got confused. I read that suggestion somewhere on one of the several threads on this/related topics, then while writing I got my wires crossed and thought it was you who had proposed it.

Just ignore the last seven words in that extract. The point stands, no matter who proposed it, or even if no one has proposed it yet!

Jason Robinson Thu 28 Nov 2013 8:06PM

OK read the whole post now. I think we are thinking on similar lines. However, as a software developer I always think of one of the golden rules of software design - making sure each component has one purpose and that only. Incorporating everything and the rest too is possible - hey we could make diaspora also serve files and incorporate an IRC server + maybe do some test automation services on top. But it's a bad idea. Diaspora server as it is now exists to provide the UI for the server. The federation stuff is actually being pushed out of the main component just so that diaspora will be more flexible. Why would we want to bundle up more non-UI related features then?

IMHO, the system to federate posts around should be decentralized, but it should also be it's own mini-network of volunteers. This is exactly what my relay servers proposal is about. :) A bunch of relays taking care of the public post handling in a decentralized way - and pods will not even have to decide which relay to use, giving total redundancy even if all except one relay is down.

I still feel many people misunderstood this which is why when I finish the statistics hub, I'll start working on a POC relay and see if I can provide the hooks on D* side (the more difficult part for myself, being ruby).

Also, as you said, we could federate the metadata for relays around totally without a central hub. Sure it's possible, but imho it's a bad idea. It adds nothing to decentralization and does not benefit anyone in any way except adding complexity. A simple list on the project site would do fine, since pods would only need to pull it in every so often to refresh their list.

Decentralization is a good thing and awesome - but it's not a magic word to use with everything and assume that it makes thing better.

Jason Robinson Thu 28 Nov 2013 8:11PM

Btw, my original proposal said storing the "wants tags" list on the central hub. This is not really necessary if such data is stored on the relays instead. It just would mean more posting of said lists around since all relays need to know asap or the pod will miss posts. Storing the list on the central hub would make for less bouncing off lists around - if the central hub is down it doesn't matter since relays have the latest list and will then refresh once the central hub is back up.

At no point in the proposal was I proposing that traffic stops when the central hub is down :)

Public post federation

Flaburgan · Mon 18 Nov 2013 4:09AM

Maciek Łoziński · Tue 19 Nov 2013 8:21PM

Elm · Wed 20 Nov 2013 3:53PM

Maciek Łoziński · Wed 20 Nov 2013 9:43PM

Maciek Łoziński · Wed 20 Nov 2013 10:53PM

goob · Thu 28 Nov 2013 4:04PM

Jason Robinson · Thu 28 Nov 2013 6:11PM

goob · Thu 28 Nov 2013 6:47PM

Jason Robinson · Thu 28 Nov 2013 8:06PM

Jason Robinson · Thu 28 Nov 2013 8:11PM