Loomio

Add pull to Diaspora's push model in federation

G goob Public Seen by 185

I had some ideas a while ago about improving communication between pods in instances where it currently falls down, but didn't know enough about how federation works to be able to flesh them out. Now, helped by Fla's blog post about federation to understand more about how it works, I've refined those ideas.

Just for clarity, this is only a speculative concept. I understand the technical issues only poorly, and so my suggestions as I've presented them may not be workable. However, I hope that, even if this proves to be the case, my suggestions will spark ideas in those of you who understand the technical side of Diaspora which might help to improve Diaspora's federation protocols.

At the moment Diaspora relies solely, or almost solely, on pushing data from one pod to another. This means that if a pod does not receive data when it is pushed, there is no way for that pod to retrieve these data at a later time. I suggest that if we're going to keep Diaspora working on a push model, we supplement this by enabling pods to pull data under certain circumstances.

New pods

Pods only receive data from pods with which they have an established connection. Currently, this means users making connections with users on other pods, and this takes time. I suggest putting in place an automatic means of connections with other pods so that this process can be done automatically, immediately the pod goes online, so that when users start using the pod, these connections with other pods are already in place.

I suggest putting in place a sort of 'handshake' system.

The process would work something like this:

  1. Podmin sets up Pod Z, and puts it online. Pod Z knows about Pod A.
  2. Pod Z contacts Pod A, and says 'Hi, which pods do you know about?'
  3. Pod A gives Pod Z a list of pods it knows about.
  4. Pod Z adds each of these pods to its knowledge base.
  5. Pod Z contacts each of these pods and asks the same question in step 2.
  6. This process is repeated until Pod Z is not finding out about any more new pods.

This way the new pod would very quickly build connections with the whole network.

Of course, there needs to be some means of establishing the first pod to contact (Pod A). This could be prompted by going to the pod of whichever account new accounts are set to auto-follow on that pod (currently the Diaspora HQ account, which is located on joindiaspora.com). Alternatively a list of a few key pods could be kept on diasporafoundation.org (not as a web page visible to visitors, but somewhere from which pods can FTP the data), or the pod could get the information from a site such as podupti.me, which is frequently updated.

One possible way of doing this would be to automatically create 'bot' accounts on each pod which communicate with each other via the above protocol. I'm calling them 'pod-spiders'. If Pod Z knows about Pod A, [email protected] adds [email protected] to its aspects in order to contact it, and so on. I'm sure the inter-pod communication could be done without setting up bot accounts, and might be a better way to do it. As much as anything, the 'pod-spider' concept is a visual aid.

Tags

As tags are not federated, you could also have each pod-spider account follow all the tags that users on its pod follow or search for. (This could involve only tags that have been searched more than 5 times or are followed by more than 5 people, to eradicate spelling mistakes.) When Pod Z goes online, [email protected] can also ask each pod it contacts 'which tags do you know about?' and can then follow those tags itself. In this way, it might be possible to populate tag searches from the time the pod goes online.

Alternatively, when a user searches for a tag which is not currently in that pod's database, the pod can pull the data on that tag from all the pods it is connected to. That way, the first time a tag search is done on that pod, it is done by a pull, which would take longer but at least would get the data. After that, data relating to that tag can be pushed to the pod in the usual way.

Non-communication

There are also some circumstances in which an established pod doesn't receive data that are pushed – for example, if a pod goes offline for a while or is temporarily over capacity. In these circumstances, it would be helpful if the pod can pull data when it goes back online.

At the moment, when Pod A can't push data to another Pod B, it puts the data back into its send queue and retries a number of times at intervals. When the last of these retries has taken place, Pod A stops trying, whether or not it has been successful. If not successful by the last of these attempts, there is no possibility of the data getting from Pod A to Pod B.

For my suggestion to work, at the end of this process of retries, if the data still cannot be pushed, Pod A should write all data destined for Pod B to a log rather than placing them back in its queue. Pod B is placed on a list of 'pods incommunicado, do not attempt to communicate', and Pod A stops trying to push new data to Pod B, instead writing it to the log. This would save network resources. Once this has happened, when there are new data destined for Pod B, Pod A should add them to this log instead of attempting to push them to Pod B. (Pod A could perhaps continue to attempt communication with Pod B say once a day, and if successful can then push the logged data.)

When Pod B is back online, it immediately communicates with all pods known to it and says: 'I'm back. What have I missed?' When Pod A receives this communication, it refers to its log for Pod B, retrieves the data and sends them to Pod B, and once it receives confirmation that this transfer has been successful, deletes the log and removes Pod B from the 'do not communicate' list.

This should (a) allow pods to receive data pushed when they were unavailable, and (b) save network resources currently wasted by pods trying to communicate many times with pods which are unavailable.

There may be other circumstances in which it would be good for a pod to be able to do a pull request – perhaps if it hadn't heard from a pod for a set period of time. However, this would involve pods keeping logs of data destined for other pods even when it hasn't detected a communication problem, so may be a waste of resources.

JR

Jason Robinson Tue 19 Nov 2013 9:35PM

It's all volunteers, not a single person does anything related to diaspora* for money AFAIK. That is unlikely to change in any immediate future :)

M

Maciek Łoziński Tue 19 Nov 2013 9:55PM

Yes, but what about the dependency problem? I think owners of pods want to be as independent from other volunteers as possible. There always is a problem with volunteers - they often become unavailable/unmotivated/busy.

JR

Jason Robinson Wed 20 Nov 2013 8:16AM

Well, I think then our biggest risk is that the developers get bored and leave - oh wait that is our biggest risk ;)

That is why the components should be open source and anything can host them. One relay goes down? Another takes it's place.

M

Maciek Łoziński Wed 20 Nov 2013 11:30AM

Yes, but if developer gets bored - it doesn't affect the network, it only slows down development. If admin gets bored - you have a handicapped network.

JR

Jason Robinson Wed 20 Nov 2013 12:03PM

No, because:

One relay goes down? Another takes it’s place.

G

goob Fri 22 Nov 2013 4:20PM

@jasonrobinson, thanks for your comments. This is a proposal to help in three specific circumstances:

  • to populate a stream on a new pod when it is first set up;
  • to retrieve posts made while a pod was down/offline, or in the case of a data loss;
  • an attempt to improve the federation of tags.

I didn't envisage this method of bot accounts connecting pods being the norm for inter-pod communication in Diaspora during the normal course of events, but something which could kick in when the normal, push method of federation hasn't worked in a particular circumstance.

So, while it is a case of 'every pod knows every pod', hopefully scalability issues wouldn't be so much of a problem because it's a method which would only kick in for a brief period of time in occasional circumstances, such as when a new pod is set up or when a pod which has been down comes back on-stream.

But it comes with the caveats that I don't really understand the technical issues, so my hope is more that some elements of my proposal might spark some ideas of what might work in the minds of people who do understand the technicals, which seems to be happening. If anything I've said leads eventually to some improvement in performance through the work of other people, I'll be happy!

R

Ryuno-Ki Mon 17 Feb 2014 8:39PM

@jasonrobinson Can you explain the issue with scalabilty further? What kind of obstacles do you expect?

JR

Jason Robinson Mon 17 Feb 2014 9:29PM

@ryunoki I guess you are referring to this?

It’s not realistic to have public posts for example federated in this way, unless we allow diaspora* as a network to stay small. I don’t know about eDonkey and FreeNet but afaik P2P is not what “everybody can follow anyone” is about. Diaspora works very well if you know who to follow. But if you just want to follow posts tagged with something - it simply will not scale, the work required to pass those messages around needs to be outsourced from the diaspora server code.

The comment is not directly about Goob's conversation started, but regarding the whole federating public posts around. I really think there are more clever ways of doing things than to make all the pods work equally and pass huge amount of messages around. In all networks and large software projects there are different components to handle things. We should have different nodes to handle different things - and no that doesn't mean giving up decentralization as long as no node is hard coded and there can be several nodes for some purpose - like relaying public posts around.

R

Ryuno-Ki Mon 17 Feb 2014 9:52PM

So scaling in terms of "there would be too large data be passed around"?

Look, for a developer "scaling up" is a common term - but not for non-coders.

Or is it meant rather like this: https://www.loomio.org/d/9vpoe0UR/public-post-federation#comment-61592
(Okay, Landau notation isn't that common either, but I can understand it as mathematician …)?

JR

Jason Robinson Wed 19 Feb 2014 7:43AM

@ryunoki yes in general, imho just replicating data around is just not sufficient and becomes problematic when the network grows.

Load More