Loomio

Public post federation

JR
Jason Robinson Public Seen by 433

The lack of public post federation in Diaspora is IMHO a make or break feature. The whole network is a little broken as small pods are cut of most of the posts on the network due to the way current federation works.

Here is my proposal for solving this issue, please see wiki post here.

It is not a comprehensive solution that can just be implemented now. It is a high level suggestion for going forward with talking about such a feature.

E

Elm Sun 13 Oct 2013

Interesting. I was thinking to something like that but not with the relay servers.

Could you emphasize the interest of having a relay server and of having several ones (with random use for each post) ?

JR

Jason Robinson Sun 13 Oct 2013

@loelousertranslato having many would mean that if a relay server is down, the posts still federate. After all these would be user hosted services, not commercial grady with 99,99% uptime guarantee.

If a pod cannot contact a relay it would just use another.

JR

Jason Robinson Sun 13 Oct 2013

Also I think pods could easily check the origin of the post easily just by asking the originating pod for the hash and compare it to the one coming from the relay - to stop relays generating posts. Will add this to the proposal.

E

Elm Sun 13 Oct 2013

At first I was thinking that the central hub could be the relay…

E

Elm Sun 13 Oct 2013

If I understand it well the central hub server has to be on but not all the time : if it shut down for a few hours it would just delay the new tags and pods that are taken into account in the list… but the federation and sending of posts would still be ok because of the relay servers.

JR

Jason Robinson Sun 13 Oct 2013

Yeah that is one part of the idea of separating the functionality of a central hub and relays.

E

Elm Sun 13 Oct 2013

Couldn’t the central hub hosting the pod list with tags be in the pods themselves. With keeping track of the latest version of the list and pushing it to all the connected pods (connected because users would have added a user/seed from them in an aspect) if their list is older and if the admin has authorized the kind of federation…

  • This way all pods participating to this should with high probability (?) and rapidly be interconnected.

  • This way the list of pods and followed tags should be repeated in each pods so that if some pods are shut down the system would still work fine…

JR

Jason Robinson Sun 13 Oct 2013

Well IMHO we need to have some kind of central hub at some point. We already have a central hub called diasporafoundation.org for the project. We also have pod lists like podupti.me. Having an official central hub that would receive data from the pods themselves would have lots of benefits. Syncing everything everywhere is problematic. For example some podmin sets up a new pod - where would that pod send the subscription list to? All the pods? How, where would it get even one pod address?

The central hub could also be used to finally gather statistics from the network (opt in of course). For example pods could report their amount of local users and post counts and these statistics could be reported on the hub to see where the network is going.

E

Elm Sun 13 Oct 2013

  • I guess that a new pod would have to connect to any big pods around to be part of the public post federation. But well yes it has to have the info of where to find another pods first (podupti, web search engine…).

  • having the list in pods with close to close spread of last version of the list is not exclusive to having a central hub that can be implemented first.

-So the order for any pods would be : 1- use the hub if it is on. 2. If not, use its own list. 3. get the list from the hub if newer list in the hub. 4. send the list to any pods and the hub if its list is newer.

Well, I write “newer list” but I am not sure that a “newer list” is more complete. Not sure how to track for the most accurate and up to date list is. I am not familiar with syncing and all…

JH

Jonne Haß Sun 13 Oct 2013

The central hub part feels just wrong, I need to think more about that part.

For the relay part, we can save the hash checking blabla, by not touching the Diaspora protocol message at all. It is signed by the author, thus there's no way for a relay to spam messages to the network, for the same reason it isn't possible right now. So we could just send messages like {"hashtags" :["#a", "#b", "c"], "diaspora_message": "xml in base64 blob ready to post to /receive/public"} to the relay server.

JR

Jason Robinson Mon 14 Oct 2013

@jonnehass yeah I have no real grasp of the D* protocol so I didn't mention too much about the specifics in the proposal, just the idea of the relay.

To me the central hub makes sense for a reliable source of network data. I mean we don't decentralize the project page and the wiki either - the central hub isn't any different from those resources.

E

Elm Wed 23 Oct 2013

Still, I also believe tag federation across pods would be a good feature for Diaspora…

BK

Brad Koehn Thu 14 Nov 2013

I have a different way to solve this problem. I'll try to get a page on the wiki this weekend.

BK

Brad Koehn Fri 15 Nov 2013

BK

Brad Koehn Fri 15 Nov 2013

Let me know if I should start a new Loomio proposal. I'm new at this.

G

goob Fri 15 Nov 2013

What advantages does this method have over the scheme I proposed in this thread (it talks about tag federation about half-way down).

I'm not a coder at all, so this is just a concept rather than anything more detailed, but I hope it might help to improve D*'s federation, especially for new and one-user pods.

BK

Brad Koehn Fri 15 Nov 2013

@goob My concern with that approach is that it's very inefficient (it looks like O(n2) for you computer science types where n is the number of pods; the amount of network traffic grows rapidly as the number of pods increases), and it implies that all pods are equally trustworthy sources of information about the podsphere.
Also it seems to me that your proposed solution would require a lot of new code to be developed, alongside new messaging semantics. This is based on a very quick analysis so I could be way off base here.

I'm trying for an incremental model that scales well with minimal new coding required. Also in my proposal tagged posts are federated in a very efficient model that scales linearly (O(n + m) where n is the number of posts and m is the number of pods).

G

goob Fri 15 Nov 2013

Thanks for you reply. That sounds plausible (damnit!).

BK

Brad Koehn Fri 15 Nov 2013

@Goob no damnit at all! I wouldn't have thought of using the pod to help locate other users or index other posts were it not for your proposal. There's probably a better idea out there than the one I proposed too; I just hope I can contribute something.

JR

Jason Robinson Sat 16 Nov 2013

Great @bradkoehn ! And @goob all ideas are welcome - they all add to our ability to build the best solution.

There are many similarities regarding mine and Brad's idea - especially the part that relates to a central hub to store pod information. I really really am sure we need this, for many reasons. I'll put something in the wiki and separate this into it's own place since it is kinda separate though required by the public post federation/aggregation. Once we have a spec we'll just need to vote since I know some community members don't like this idea even if it is totally opt in :P

As for the idea from Brad, I'm quite sure it would do the job and would be happy if either idea was implemented. Initially I was questioning the idea of pull instead of push, but I guess the pubsubhubbub takes care of that problem (even though then we do rely on those external services, the default diaspora uses is from google).

I do think though that my relay server idea is lighter because there is no need to save posts. It also handles redundancy - pods are not tied to any particular aggregator and thus even if all but one of them are down the post will be delivered to all listeners.

Security I guess in both would be the same. Except I see some worry that an aggregator could be populated with non-authentic posts, and even if no pods accept the posts, some other source might do. Since the aggregator would have an open interface, it wouldn't take long for someone to build an app to show posts in the diaspora network going through the aggregator. In this situation it would be trivial to inject posts into the aggregator, unless the aggregator checks all of them. In the relay idea this is not a risk since the aggregator doesn't store posts.

Any other opinions on these ideas?

F

Flaburgan Mon 18 Nov 2013

I don't know enough to talk about the technical point, but I know something: I'm strongly opposed to anything which would involve Google services (even if it's public data): we saw how they turned of Google reader, or decide suddently to make Google Maps API a paying service. I don't want to depend of a company for a feature critic like this one.

MŁ

Maciek Łoziński Tue 19 Nov 2013

There are many P2P networks and routing protocols out there, In my opinion we should go this path.
What if every pod was a relay for it's own users' followed tags?

E

Elm Wed 20 Nov 2013

I’ll prefer the distributed path even if some seeds/pods would loose a bit of not federated info. (Not sure to understand how it would work out though). @macieklozinski : could you precise how it would work for tag federation ?

MŁ

Maciek Łoziński Wed 20 Nov 2013

I'll try to do some deeper research on possible p2p solutions.

MŁ

Maciek Łoziński Wed 20 Nov 2013

I'm not sure if it fits well with Diaspora's protocols, but I could suggest something like this:

  • When user A shares with user B on another pod, user B's pod becomes "neighbor" of user A's pod.
  • User B's pod "subscribes" to user A's pod for all tags that user B's pod users follow.
  • Each pod keeps a list of it's neighbors and tags they subscribe.
  • When user on a certain pod makes a public post, it's sent to all neighbors subscribed for tags present on this post.
  • If a pod receives a public post from other pod and does not have this post in it's database, it passes it to all neighbors subscribing to tags present in this post, and saves post to database.
  • If a received post is already present in database, nothing happens.
G

goob Thu 28 Nov 2013

I think there's a big difference between having a central hub which contains information pertaining to the D* network but which is separate from the network itself (such as the project website, poduptime, etc), and a central hub which is an integral part of the D* network and receives/sends/stores data from that network, such as post data, which is what is being proposed here. With a central hub as an integral part of the network, the network would no longer be fully distributed.

If a central hub of any sort is actually needed in order for post/tag federation to work properly, I suggest it be restricted to holding meta-data, such as a list and IP addresses of pods or relay servers. This could be the same central hub which helped people to choose a pod to register at, as poduptime does at the moment.

It would then only be referred to when a new pod or relay server was brought online. The new pod would then call hub.diasporafoundation.org (for example), which would give it some pods/relay servers to contact from which it could pull post data. The actual transmitting of post data would be done by the pods/relay servers themselves, with no involvement from the central hub.

This is similar to one of the proposals I made in this discussion on adding pull to Diaspora's push model (the proposal concerning tags).

I'm not sure relay servers separate from the pods themselves would be needed; I think there is a way of making pods federate public data more effectively without using a separate network of relays, if they are connected correctly together.

Note that in the following, when I talk of connections/sharing between pods, I'm not talking about the normal connections between pods which exist, but a kind of meta-network to push public data around more effectively, of the kind Jason talks about in his proposal.

I would suggest using a kind of 'cell structure', in which each pod is connected directly with several other pods in the network, and through that structure build up a list of public posts and tagged posts data to pass on to other pods. This avoids the problem of scalability faced if 'every pod knows every pod'. If the relay connections between pods are made correctly, public data will be federated to every pod quickly, via indirect routes (Pod A shares it with the several pods to which it has direct connections; those pods share it with the pods with which they have direct connections; and so on). If there is redundancy built in to this network, it won't matter if several pods in this network are down; the data will get fed around to the whole network eventually in any case.

It might be that each pod needs to be connected only to two other pods in the network for this to work, like the classic Communist cell structure – as in the graphic below (not perfectly illustrated, but it gives you an idea):

Cell structure network

I'm sure there is a way of coding into the D* software itself so that it builds a network of connections such that each time a new pod is brought into the network, the network recalibrates its connections so that this new pod is made a part of the sharing network, without reference to any external source such as relay servers or a central hub. Likewise each time a pod drops out. However, I would have no idea how to do this! I hope someone out there will do, and that my partly developed concept will spark ideas for practical solutions in their mind.

If a central hub is needed to help new pods get connected, I think we should have a mirror or two on other servers just in case the project site is down when a pod is brought online.

JR

Jason Robinson Thu 28 Nov 2013

@goob , will read the rest of you long comment later, but you should maybe read my proposal too ;)

a central hub which is an integral part of the D* network and receives/sends/stores data from that network, such as post data, which is what is being proposed here.

I have proposed no such thing. This is the reason I stopped the whole vote for the central hub because not many people even understood my proposal.

G

goob Thu 28 Nov 2013

My mistake. I did read you various proposals and wiki articles, but there’s been so much to read and digest that I got confused. I read that suggestion somewhere on one of the several threads on this/related topics, then while writing I got my wires crossed and thought it was you who had proposed it.

Just ignore the last seven words in that extract. The point stands, no matter who proposed it, or even if no one has proposed it yet!

JR

Jason Robinson Thu 28 Nov 2013

OK read the whole post now. I think we are thinking on similar lines. However, as a software developer I always think of one of the golden rules of software design - making sure each component has one purpose and that only. Incorporating everything and the rest too is possible - hey we could make diaspora also serve files and incorporate an IRC server + maybe do some test automation services on top. But it's a bad idea. Diaspora server as it is now exists to provide the UI for the server. The federation stuff is actually being pushed out of the main component just so that diaspora will be more flexible. Why would we want to bundle up more non-UI related features then?

IMHO, the system to federate posts around should be decentralized, but it should also be it's own mini-network of volunteers. This is exactly what my relay servers proposal is about. :) A bunch of relays taking care of the public post handling in a decentralized way - and pods will not even have to decide which relay to use, giving total redundancy even if all except one relay is down.

I still feel many people misunderstood this which is why when I finish the statistics hub, I'll start working on a POC relay and see if I can provide the hooks on D* side (the more difficult part for myself, being ruby).

Also, as you said, we could federate the metadata for relays around totally without a central hub. Sure it's possible, but imho it's a bad idea. It adds nothing to decentralization and does not benefit anyone in any way except adding complexity. A simple list on the project site would do fine, since pods would only need to pull it in every so often to refresh their list.

Decentralization is a good thing and awesome - but it's not a magic word to use with everything and assume that it makes thing better.

JR

Jason Robinson Thu 28 Nov 2013

Btw, my original proposal said storing the "wants tags" list on the central hub. This is not really necessary if such data is stored on the relays instead. It just would mean more posting of said lists around since all relays need to know asap or the pod will miss posts. Storing the list on the central hub would make for less bouncing off lists around - if the central hub is down it doesn't matter since relays have the latest list and will then refresh once the central hub is back up.

At no point in the proposal was I proposing that traffic stops when the central hub is down :)

G

goob Sat 21 Dec 2013

Does anyone know what it is specifically in the code or structure of the Diaspora network which is causing public post federation to work unreliably?

If it is because Diaspora relies on push notifications to transmit data between pods, could this be solved by allowing a pod to send pull requests to other pods in the network for any data missed when it comes online after some downtime or after being overloaded and unable to receive communications from other pods, or after it is brought online to the network for the first time? I propose a potential solution to this under 'Non-communication' (the third point) in the discussion about adding pull to the push model for federation. While there are concerns about scalability for the other points (getting new pods fully connected and federating tags) in my post, hopefully enabling a pod to send a pull request to other pods when it comes online so it can pick up data (including public posts) it missed while it was offline would help federation of public posts at least in some circumstances where it currently fails.

If we could identify the various factors causing causing federation of public posts not to work properly in different circumstances, it would, I'm sure, be a big help in solving the problem.

JR

Jason Robinson Sat 21 Dec 2013

@goob it's not that public post federation does not work properly, it's that it's not implemented at all. Currently posts just end up on various pods - there is no technical design to say that any public post should be available to any subscriber on any pod.

Personally I want to start prototyping the relay concept - I think once I have a working demo it might be liked ;)

G

goob Sat 21 Dec 2013

OK, thanks. That definitely sounds like a design flaw!

G

goob Mon 17 Feb 2014

By the way, @bradkoehn, I think it would be worth making your proposal here in Loomio, as proposals on the wiki tend to get overlooked.

R

Ryuno-Ki Mon 17 Feb 2014

Thanks, @seantilleycommunit, for granting me writing permission here :)

@jasonrobinson: Say, I'm a bad programmer guy.

Relay receives a post from a pod

Relay already has a cached list of pods and what hashtags they want so relay will deliver post to pods that are interested in one or more of the hashtags in this post. Relay is not for public message keeping - it will delete any posts as soon as they have been pushed out.

Could I misuse your proposal in any way by running a modified version of the code, which does not delete any posts?

It would come in handy, if you could list the information, you want to "store" in the directory/hub, to better judge this proposal.

JR

Jason Robinson Tue 18 Feb 2014

@ryunoki well, since all the posts are public that would go through the relay, what does it matter if someone would? :)

You could already do it, just start saving posts from large popular pods by following a few hundred popular tags.

Actually the relay way you would only get a subset of posts - the more relays, the less posts that go through each relay. Say 5 relays, you would only get approx 1/5 of public posts from opt-in pods, and even then only those with one or more hashtags.

By proposal, the hub would not store anything else than data related to which pods should receive which tags. So something like a dictionary with pod host and N tags that it wants.

R

Ryuno-Ki Tue 18 Feb 2014

I'm trying to consider worst cases to improve the proposal, Jason. That's all :)

MŁ

Maciek Łoziński Mon 17 Mar 2014

my federation protocol proposal:
https://github.com/loziniak/diaspora_federation

RF

Rasmus Fuhse Mon 17 Mar 2014

In my opinion the big question is:

Is it better make following of hashtags be part of the protocol (like Maciek's proposal).

Or is it better to make a search-endpoint be part of the federation, which can be used to search for postings with hashtags (and maybe users and other stuff).

Both ways will work, but which way is more reliable and more performant? Is there any third option?

MŁ

Maciek Łoziński Mon 17 Mar 2014

Can you tell more about the search-based approach? When the search would be performed and how often? By whom and on which servers? What exactly would be searched for?

RF

Rasmus Fuhse Mon 17 Mar 2014

There is more than one possible search-approach. Jason for example would like some central search-server(s) like friendica or redmatrix have. But it would also be possible to have a search-endpoint on each pod that might be called by "neighbor" pods periodically or only if a user requests a search. But those details are not the question at this early stage, I think. The big question is still what do we want: pushing the news or pulling the info?

MŁ

Maciek Łoziński Mon 17 Mar 2014

Maybe better than wonder and debate, it would be better to try one way, and when it’s not ok, then try another. There are quite a few ideas for developers to choose from. Maybe we should let them decide what is easier/faster to implement?

JR

Jason Robinson Mon 17 Mar 2014

Interesting proposal @macieklozinski - not a bad concept imho. Would love to hear from the more federation-stuff experienced devs.

Although imho I still think federating public posts should be outsourced outside pod software itself. Podmins are already complaining about heavy sidekiq processes - keeping public post federation in the core code would be a big burden to all pods.

Would need to do a simulation to calculate really :P

But I agree that we should just do something :) Any sane implementation would be cool.

MW

Mark Williams Sat 5 Jul 2014

I first tried Diaspora a few years ago by joining a pod with lots of users on it, and loved that right after creating my account I had posts appear in my feed that matched the tags I was interested in. I finally returned to D* a couple weeks ago to help with development, and was disappointed after setting up my own pod how lonely it feels without public posts from the rest of the network being pushed to me! So I'm glad to see so many here who agree that federation of public posts is a very important feature for D*.

Doing this right is not trivial, but I think a DHT-based (Distributed Hash Table) solution might be the right fit. I'm not an expert on the various flavors of DHT out there, but after doing some research it looks like Pastry might be a good choice. In particular, there is already a publish/subscribe application called Scribe designed for it, and an open source implementation called FreePastry. In a nutshell, the Pastry+Scribe combination provides O(log(n)) average routing hops between nodes, high tolerance of nodes entering/leaving the network, automatic load balancing of topic subscription management and notification multicasts across the network, and the ability to structure the routes between nodes in a way that minimizes overall latency/bandwidth (or other relevant metric.) The idea would be that every D* pod would run a node in the DHT network, which would allow the overhead associated with managing subscriptions and disseminating public posts to subscribers to be automatically shared among all the network's nodes.

I am going to run some simulations using FreePastry+Scribe to verify this approach for a "hashtag subscription" feature for D*, but before digging in too deeply I have a few questions:

1) FreePastry is written in Java 5 and its architecture takes advantage of Java threads and asynchronous IO. It might not be a trivial exercise to port this to Ruby+Rails, and I think in any case it would be best to keep any new DHT component cleanly decoupled from the main D* application. What is the development team's stance towards adding a JVM instance (OpenJDK 6/7) as a new tier to the pod design? I think it would complicate pod setup and configuration a little, but probably not too much.

2) FreePastry's implementation uses its own TCP connections for messaging, and UDP for keep-alives. Its architecture is very modular, and so it's probably possible to proxy all its communication through D*'s existing https-based communication scheme if absolutely necessary. But in the interests of performance and clean design, the much better approach is probably to let the Pastry tier handle its own P2P network communications, and let it communicate through a web services API locally with the Ruby+Rails tier and/or directly with the local database for everything else. In terms of security, https isn't needed, since we're dealing with public posts; all that needs to be done is make sure that the payloads carried by Pastry are cryptographically signed. The downside to the separate P2P communication is that it would add to the firewall setup requirements for a pod (although FreePastry already has the ability to use uPnP to open its own ports to the internet, where supported.) How does the development team feel about the idea of requiring additional ports to be opened between pods and the internet?

3) Are there any parts of the database that are designed to be usable as an interface, i.e. not meant to be controlled and accessed exclusively by the Ruby+Rails and Sidekiq tiers? (For example, is it "legal" to write posts directly to the database without going through the Ruby+Rails app?)

With a DHT-based P2P network to leverage, other useful functions could eventually be added in a scalable way to D*, for example (a) load-balancing of requests for relatively large content like images, so that for example an image in a post from a tiny pod that gets wide distribution in the D* network doesn't result in that pod being swamped with requests for the image from the entire network, (b) network-wide features like user search/discovery, (c) helping other D* functions to scale as the network grows, such as propagation of public posts from originators to followers.

Thanks for your time, I appreciate any feedback or advice you may have!

MVD

Melroy van den Berg Wed 6 Aug 2014

@markwilliams2 I love the idea, and I'm also researching the problem and possible solutions. See point #2 on my list: https://wiki.diasporafoundation.org/User:Danger89

I hope that we can come in contact with each other to discuss this futher and finally try to implement a working prototype.

MVD

Melroy van den Berg Thu 7 Aug 2014

Let's place is like this: I think to make Diaspora a good decentralized social network, the relational database should be removed and replaced by an Apache Cassandra database (for example), at-least a database which is vertically scalable with high availability & reliability.

This is also known as 'NoSQL database environment'. This means in fact.. that the current project as it is should be rewritten almost entirely (!) to compete against existing social networks like Facebook, Google+, Twitter, etc.

So... Good luck :)

JR

Jason Robinson Thu 7 Aug 2014

@melroyvandenberg I think you will find little support for a complete rewrite - unless you do it yourself ;) You can always fork and replace the DB.

diaspora* started with MongoDB which didn't work for some reason. Do you mind explaining in more detail why you think a NoSQL database would be better than a relational database, for diaspora*?

MVD

Melroy van den Berg Thu 7 Aug 2014

@jasonrobinson I try to dive deeper into distributed hash table (DHT), which makes it possible to search users within the network (regardless of the pod). But the same will work both public messages and hashtags, etc.

A non relational database and using hashing (key-value) will make this possible. That is the current problem of Diaspora, the decentralized network isn't really connected, a pod floats in the Internet currently.

MVD

Melroy van den Berg Thu 7 Aug 2014

Maybe this site gives you a better explanation of the implementation details of the idea of DHT (funny sentence):
http://www.rackspace.com/blog/cassandra-by-example/

JR

Jason Robinson Thu 7 Aug 2014

That is the current problem of Diaspora, the decentralized network isn’t really connected, a pod floats in the Internet currently.

I think that is the whole point that pods "float" in the internet. I'm quite sure the current model isn't something that would be powerful and scalable enough to take on a network like Facebook - but the diaspora* server isn't really something that is supposed to do that IMHO. It's just server software - and there is no requirement to connect to the wider network of diaspora pods.

To make it really big the server should be just nodes that people can run that automatically enhance the network. Now things are different. Each pod is very independent with absolutely no constraints placed on how to run it or even on what configuration.

The diaspora* network really only federates on the protocol level. What uses the protocol doesn't matter. There is already Friendica (made in PHP) that talks the diaspora* protocol. There is also a Python version (Pyaspora) that also talks diaspora*.

G

goob Thu 7 Aug 2014

diaspora* started with MongoDB which didn’t work for some reason.

Sarah Mei (a previous developer for Diaspora) wrote this article about why MongoDB didn't work.

makes it possible to search users within the network (regardless of the pod)

It is already possible to search for users on other pods. Melroy, the problems you're encountering (including some of those on your to-do list) may be because you've set up a new pod for yourself very recently. One of the software's problems is the case when a new pod connects to the network for the first time - at first it doesn't have established connections with other pods, so things such as search and following #tags return no results. This is a real problem, and it is something that could usefully be tackled.

It may be that changing the database isn't the answer to your problem: simply running your pod for a while, making connections with other pods, will bring the results you're looking for.

MVD

Melroy van den Berg Thu 7 Aug 2014

@goob far enough, however... I got 2 registered persons, who doesn't do anything.. nothing happens with the system, it will not 'connect with other pods', meaning it will not share information among other pods including myself.

That is the problem, the base of a social system should be sharing. That is where DHTs kicks-in, however, this requires a whole different way of thinking

JR

Jason Robinson Thu 7 Aug 2014

@melroyvandenberg what is you d* handle - or add me at jaywink@iliketoast.net

Yeah as @goob said, this is a huge problem. We need some addition to the protocol to support network wide searching (imho hackish and a large burden to the network) - OR a central hub that would be queried (opt-in publishing handle there).

Unfortunately the opposition to any "central helpers" is kinda strong here - maybe that sentiment will change :)

JR

Jason Robinson Thu 7 Aug 2014

Public post federation (pushing) around the network is also one thing - a few proposal have been made to tackle that here and here at least - but no implementation yet.

G

goob Thu 7 Aug 2014

That is the problem, the base of a social system should be sharing.

The basis of Diaspora is sharing. Every pod in the network shares data (where appropriate) with every other pod. The problem you are experiencing - and it is a major problem, which needs solving - is how to get your pod to start connecting and sharing data with enough other pods to receive all relevant content.

I don't think it's a problem with the type of database being used - it's a problem of your pod, new to the network, knowing what other pods are part of the network and how to find them in order to be able to access their databases.

G

goob Thu 7 Aug 2014

Have a look at our tutorial series on getting started, which will tell you how the network should work, once your pod is connected to other pods.

I made a proposal to help in certain situations, one being when a new pod is added to the network, here. Apparently some of my proposals would not be scalable as the network grows in size, but there are various ideas knocking around related to this problem of the experience of new pods. If you can help solve this problem, that would be fantastic.

MVD

Melroy van den Berg Thu 7 Aug 2014

I still think the solution could be DHT, please read the Bittorrent DHT spec about nodes / node ID's and route tables:
http://www.bittorrent.org/beps/bep_0005.html

And read '5.4 Bootstrapping' of the Facebook Cassandra PDF:
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf

EDIT, even Patrick McFadin says it:
http://youtu.be/B_HTdrTgGNs?t=1h3m49s

MW

Mark Williams Thu 7 Aug 2014

Glad to see there's still interest in this topic. I've made some progress on a prototype solution for public post federation using Pastry; I'll update this thread when it's ready for testing on real pods.

In case this needs more clarification: the local database for a pod, regardless of whether it's a relational database or a key-value store (and hard experience taught the D* team that a key-value store is not the appropriate choice), is completely distinct from the DHT-based distributed network layer (Pastry, in this case) I'm proposing to use to implement federated public post features. This is a "third way" which avoids the type of naive network search that puts a large burden on the network, while also avoiding the need to introduce central hubs.

IMO central hubs would weaken D* by creating a dependency on special, well-connected, resource-intensive nodes, which would be too expensive for the average person to run. This would erode much of the benefit of a decentralized D* network; it would make it much less democratic, more susceptible to interference and obstruction. It would also be a less robust solution than distributing responsibility for doing federated processing across many/all D* pods.

JR

Jason Robinson Thu 7 Aug 2014

@markwilliams2 looking forward to seeing your update!

TS

Trolli Schmittlauch Fri 8 Aug 2014

I guess @melroyvandenberg 's point is that if we want to federate all public posts, currently this would require every pod to receive and save all public posts locally.

Saving posts locally is ok as long as only the posts from contacts in aspects of users of the pod are saved. This isn't much data, at least on small pods. But saving all public posts requires a lot of more space and resources, doesn't it?

@melroyvandenberg 's idea seems to be dynamically getting the posts from other pods only if requested by the user. I don't know whether this can work, but this is my understanding of the issue.

FS

Florian Staudacher Sat 9 Aug 2014

I'm a little skeptical to introduce a whole new concept like DHT. It may seem appropriate, but I also detect faint amounts of "if all you know is a hammer, everything looks like a nail" ... ;)

So, what about the pseudo PubSub stuff we have going for relaying interactions on posts to/from different pods? I suppose that could be extended into a full-blown PubSub system where users could actually subscribe to contents on other pods...

JR

Jason Robinson Sun 10 Aug 2014

@florianstaudacher the requirement IMHO is that everything is transparent. If a user follows a tag - posts should be in an ideal situation seen from all around the network - just like on Twitter. That is at least something I would see that would be the only way it would work - normal users will just be confused if they have to do additional work to "enable" content from other pods.

MVD

Melroy van den Berg Thu 14 Aug 2014

@jasonrobinson Exactly my point.

MŁ

Maciek Łoziński Thu 9 Oct 2014

@melroyvandenberg, do you suggest that pods should be connected on a database level (Cassandra) instead/in addition to the protocol level?

MVD

Melroy van den Berg Thu 9 Oct 2014

@macieklozinski
I think that would be indeed a wonderful solution. Except that you should think about privacy in transporting this data between the pods, similar to the way that privacy is also important of the current data which is sent using the current protocol.

This way we can see other posts from every pod in the world, meaning we are fully finally connected between the pods (just like a big network). Which is your main goal after all, right?

JR

Jason Robinson Sat 30 May 2015

Started thinking of a hacky non-core code requiring solution that would allow users to participate via pod looking relay servers. Terrible in terms of privacy compared to the original proposal, but it would be easier to implement :P

If it worked and would be wanted, then core code could be introduced to make it transparent.

A

aj Sat 30 May 2015

i kind of like the way diaspora forms communities and groups in sort of an organic way, if there were a common aggregate of all public posts like a global pubsubhubbub or whatever it would maybe kind of change the way of it...

JR

Jason Robinson Sat 30 May 2015

That works fine for medium to large pods, but single user and small pods are lonely places until they share with lots of contacts. One shouldn't be required to follow hundreds of users just to see public posts.

A

aj Sun 14 Jun 2015

ya starting my pod i more or less had to find contacts on jd and then search for the same contact from my pod to add it, a real pain... would be great if a new pod could get a feed of public posts from one of the larger pods, at least for few weeks after being added to the network

JR

Jason Robinson Sat 11 Jul 2015

Updated proposal specifications - and now with some PoC code

See here: https://wiki.diasporafoundation.org/Relay_servers_for_public_posts

I'll continue working on the code part and hopefully aim to submit a PR to diaspora core for at least post relaying within my summer holiday (so under 2 weeks).

JR

Jason Robinson Wed 15 Jul 2015

hey @jhass @dennisschubert and others. What do you think about the proposed pod settings this stuff would need? I'm kinda ready to implement the last part of the relay ie querying single pods and pushing posts out to them. So I could also do the PR towards diaspora - for the inbound configuration part first, then second the outbound configuration.

JH

Jonne Haß Wed 15 Jul 2015

statistics.json/NodeInfo is about metadata, not protocol extensions, that is protocols shouldn't make decisions based on its output.

I'd say just add a .well-known route, /.well-known/x-diaspora-relay or something.

JR

Jason Robinson Thu 16 Jul 2015

@jhass Fair enough, that might make sense. I'm supposing it should be constructed by the diaspora-federation gem?

But in general, you or other core members don't object to the extra configuration for diaspora as proposed, assuming the change is that it is reflected in .well-known, not nodeinfo?

JH

Jonne Haß Thu 16 Jul 2015

Well, it's needed for the feature to work, right?

JR

Jason Robinson Thu 16 Jul 2015

@jhass as written currently, yes. The other option is each podmin configures their web server to serve a manually maintained file :P

But really, if the configuration would not be accepted to diaspora, as a fall-back I would centralize the idea to the-federation.info and add to it a form for podmins to register their subscription preferences. The send part is more difficult, that would require a patch commit that podmins could pull in if they wanted.

I personally don't see any harm in including the configuration in diaspora itself. I believe this is a good way forward to attempt to fix some of the issues caused by the federation model ie small pods not able to receive enough public posts for it to make sense to set up a single user pod. Also, this would allow pods to customize their scope to specific interest areas, if the solution gets wider adoption within the network.

JR

Jason Robinson Thu 16 Jul 2015

I'll make it .well-known/diaspora-relay. RFC5785 doesn't state prefixing names with x- and it doesn't seem common looking at the registry. I'll submit a registration request also for this .well-known.

Thinking about it, we should really put version and protocol generic information in .well-known/diaspora, not in headers/statistics as currently is done. But not in scope of this :)

JR

Jason Robinson Thu 16 Jul 2015

Or actually, it should probably be .well-known/social-relay, not diaspora-relay. Nothing in the relay concept the way I think about it is diaspora specific, except the initial implementation is geared towards diaspora.

JR

Jason Robinson Thu 16 Jul 2015

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "http://the-federation.info/social-relay/well-known-schema-v1.json",
  "type": "object",
  "properties": {
    "subscribe": {
      "type": "boolean"
    },
    "scope": {
      "type": "string",
      "pattern": "^all|tags$"
    },
    "tags": {
      "type": "array",
      "items": {"type": "string"},
      "uniqueItems": true
    }
  },
  "required": [
    "subscribe",
    "scope",
    "tags"
  ]
}
JR

Jason Robinson Thu 16 Jul 2015

social-federation can now generate it.

JR

Jason Robinson Thu 16 Jul 2015

Actually, @jhass @dennisschubert do you want the .well-known/social-relay generation in diaspora core or diaspora-federation gem? It's not really part of the federation, more like add-on system to push posts around, so I'm kinda hesitant to push it there. Can I just add a new route/controller/presenter etc, like the current statistics.json is done?

Or should I make a gem well-known-social-relay? :P

JH

Jonne Haß Thu 16 Jul 2015

I guess having it in the core would be okay for now, should be fairly easy to push elsewhere if needed.

F

Flaburgan Sun 26 Jul 2015

Okay I read the specification and discussed with @jasonrobinson on IRC about it.

First of all Jason, thank you very much for dealing with this important problem of diaspora*.

Although this proposition solves most of the problem, there are some points we should should be careful about:

  • [Warn] External dependency for a core feature is dangerous. To send messages to other pods is the core feature of a pod. To use external app servers to do that means the network would have a big dependency to a few servers, which can be attacked or not correctly maintained. This looks dangerous to me.
  • [Warn] On the same topic, to use a centralized list of pods is a potential vector of attack / problem. We're loosing part of the force of diaspora* here.
  • [Warn] Pods are not equal anymore. Until now, the difference between pods was on side features like services enabled or chat. With this proposition, we would have to explain to user that, depending where they choose to register, they will not have the same content available. This is the opposite of what we always said, and this is exactly the problem we are trying to solve: we don't want users to choose a pod because they have to go there, because that's where the content is.
  • [Blocking] The interactions on posts that are transmitted by relay are not federated. This point is a blocking point to me. It completely breaks the usage of diaspora* and means a lot more complains about the federation being broken. I don't see the point of displaying a post if I know that only the users of my pod will see my reaction on it. Most of the time, I want to answer to the author of the post.

For those reasons, I think your proposition is not a good solution. I'll try to propose something else soon.

JR

Jason Robinson Sun 26 Jul 2015

For those reasons, I think your proposition is not a good solution. I’ll try to propose something else soon.

This is not a proposition any more. It stopped being one when I changed the original one not to depend on the core so much. Right now it depends on only the carbon copying of posts outwards - even if that which is now in develop was reverted I could do a single commit patch which podmins could pull in if they want.

So this is pretty much live now, just not fully functional. I already see diasporapr.tk sending posts out to the relay :) I'll push the latest changes to the relay live early next week so posts will be relayed for the first time and start real world testing.

[Warn] External dependency for a core feature is dangerous.

It's not a core feature. The core feature is to NOT deliver posts by design to all pods. And that works and will continue to work.

[Warn] On the same topic, to use a centralized list of pods is a potential vector of attack / problem. We’re loosing part of the force of diaspora* here.

Part 2 would be decentralizing the relays themselves. Initially yes each pod configuring a single relay makes it weaker. But less weaker than pod email delivery or hosting, which is the weakest part of diaspora, users being locked into a single server for life. And since this is not a core feature, like user login is...

[Warn] Pods are not equal anymore.

They are even less equal now. Right now it makes sense to join a large pod, to see many public posts. Setting up your own pod doesn't make sense. Using relays will make pods more equal.
But, the relay will also enable pods to be more strongly themed, for example a pod could subscribe to only linux and open source posts, ignoring all the other stuff.

[Blocking] The interactions on posts that are transmitted by relay are not federated.

Well, the same problem is with reshares. And the interactions can be solved, just have to decide which way to go, to relay them or to only use relays for the initial post delivery. I think only using this in real world will tell which is better. Anyway, it needs to happen before 0.6 is released and also before that the participations bloat needs to be dealt with and the federation tuned to be more efficient. Will be submitting something for both these for consideration.

I don’t see the point of displaying a post if I know that only the users of my pod will see my reaction on it.

Not entirely true. Since a pod which gets a post via a relay will fetch the author contact (by diaspora protocol design), interactions will be sent to the original pod as if the pod had delivered the post. The problem is that afaict the original pod will only relay the interaction as normal, not to other pods that depend on relays. This is how I understand it:

  • pod A <--- author of post
  • pod B <--- contact of pod A author
  • pod C <--- not in contact with pod A author
  • pod D <--- not in contact with pod A author

So when pod A author sends a post it will be delivered to pod B user directly and pod C and pod D users via relay (assuming both subscribe in this case).

Initial relay concept doesn't relay interactions, so when a user on any of the pods comments:

  • pod A will receive it
  • pod B will receive it (since pod A relays it)
  • pod C will not receive it (unless done from pod C)
  • pod D will not receive it (unless done from pod D)

But this situation is fixable by defining whose responsibility is to do what. Of course, either the relay should take care of whatever "broken" links it creates OR it should create participations so that interactions flow as they should. Though as said, the current reshare concept also has these kind of bugs.

Thanks for your comments and while I'm looking forward to seeing a proposal to the core that would solve federating all posts to whoever wants them but allow still pods to not receive all posts, I really doubt that kind of solution is doable to the core and it wouldn't even make sense to bloat the core with it.

F

Flaburgan Sun 26 Jul 2015

I wrote what I have in mind on https://wiki.diasporafoundation.org/Follow_other_pods_tags

I'll now read your answer ;)

F

Flaburgan Sun 26 Jul 2015

In my opinion, it is a core feature to deliver the message. Currently, this feature is incomplete because it doesn't allow to follow tags on other pods. So, we have to patch the core, not to build another tool to balance its weakness. Related, the fact that pods are not equal now is due to this incomplete federation, not because of a setting. This is really important to me. In the first case, it only means we need to improve the software, when in the second case, it means the equality is broken by choice. A bad thing in my opinion.

Part 2 would be decentralizing the relays themselves.

With the "perfect situation" becoming one relay per pod? And then, to make relays to forward interactions? I can't loose the feeling that we're building another network on the top of the diaspora one instead of patching it here.

JR

Jason Robinson Sun 26 Jul 2015

In my opinion, it is a core feature to deliver the message. Currently, this feature is incomplete because it doesn’t allow to follow tags on other pods. So, we have to patch the core, not to build another tool to balance its weakness.

Well I still disagree - the core doesn't have to be a does everything solution. It's bloated as it is and already takes too much resources to run. Granted, the relay system will increase the load across the network, but it will increase it less than if all the pods did all the work.

Related, the fact that pods are not equal now is due to this incomplete federation, not because of a setting. This is really important to me. In the first case, it only means we need to improve the software, when in the second case, it means the equality is broken by choice. A bad thing in my opinion.

Well, as we are talking about a decentralized place, pods should be allowed to be not equal if they want to be.

I read your tag based proposal and it could be a nice improvement to the core. However, as you note, it would not help in the case of new pods which would still have to do a lot of manual work to register with this and that pod. The relay system only requires a new pod to register with a pod list - and relays could even use many pod lists or even be pod lists themselves.

Also I don't believe this is true:

Every interactions is possible on the posts received with that solution, so answers (comments), likes and reshares will be received by the original pod which created the post and all the others which received it

Assuming you mean that participations would also follow the tags in the post, then this is true only to the point where users don't stop following tags. If the last user stops following a tag on a pod, the relations would stop going through.

All in all, that could be a nice addition to consider for the core (with maybe the addition that only active users tags are considered, not everybody) but IMHO it doesn't solve the broken network problem like the relay does. It only makes the broken network problem less dissipate faster, but the effect is the same for brand new pods. The solution would also be heavier on every single post for post delivery.

F

Flaburgan Sun 26 Jul 2015

I read your tag based proposal and it could be a nice improvement to the core. However, as you note, it would not help in the case of new pods which would still have to do a lot of manual work to register with this and that pod.

That is true, but it is a different issue in my opinion, this is what I would call "network discovery". It is not only about tags, we can want to find users too for example.

About the tag following problem and my proposition, if the pod knew every other pod on the network, the problem would be solved. So we can choose to solve this by simply fetching the list of pod from the-federation.info, as you propose to do for the relays.

Assuming you mean that participations would also follow the tags in the post, then this is true only to the point where users don’t stop following tags. If the last user stops following a tag on a pod, the relations would stop going through.

Not sure what you meant here. What I meant was, if you write a post about #diaspora from your pod, that I receive it because my pod told yours that it is interested about diaspora*, and then Jonne answers on your post from his pod, I will receive Jonne's answer because your pod knows it sent me the message so it is able to forward Jonne's comment.

active users tags

I don't get what you're talking about?

JR

Jason Robinson Sun 26 Jul 2015

and then Jonne answers on your post from his pod, I will receive Jonne’s answer because your pod knows it sent me the message so it is able to forward Jonne’s comment.

You mean pods would explicitly track who they've sent posts to? I think it works currently the way that contacts are checked through (sharing and shared with) when deciding where to send. I don't think posts "remember" where they have been sent. I might be wrong :)

active users tags

I don’t get what you’re talking about?

Just a small detail. For active tags it makes sense to only look at tags followed by active users. Otherwise a user that logs in once and follows a tag will cause the pod to forever follow that tag. The relay subscription prefs work (if set so) using the 6 month active users.

JR

Jason Robinson Sat 3 Oct 2015

Added some notes and ideas regarding the participations relaying to our Paris board. Would love to discuss at least for some brainstorming.

RD

Richard Decal Mon 14 Dec 2015

Re: Jason's "pods should be allowed to be not equal if they want to be."

I strongly believe that which content any user wants to subscribe to should decided at the user-level rather than the pod admin level. If one user wants to follow basketball posts, and another wants to follow Linux posts, they should make that decision rather than it be imposed on them by some stranger. I don't want to join a pod only to find out the admin severed my access to one of my interests because they don't share that interest.

JR

Jason Robinson Mon 14 Dec 2015

@richarddecal

If one user wants to follow basketball posts, and another wants to follow Linux posts, they should make that decision rather than it be imposed on them by some stranger.

I couldn't agree more. Currently, the defaults are probably not the best, for the diaspora* relay code. Could probably change them before the relay hits "mainstream" in 0.6, currently it's only in development pods.

The defaults are:

inbound:
  subscribe: false
  scope: tags
  include_user_tags: false
  pod_tags:

So, podmin must change "subsribe" to true to enable the functionality and "include_user_tags" to true, if user tags should be collected. I think the latter should be changed to "true" by default.

I'd enable the whole relay functionality for user tags on by default but that would never pass :) And since the relay is third-party stuff, it's prob a good idea to keep it off by default.

JR

Jason Robinson Mon 14 Dec 2015

(also, the code allows mixing, podmin can define tags and still have user tags being subscribed to)

JR

Jason Robinson Sat 9 Jan 2016

My proposals to solve the participations and relay decentralization issues.

DU

[deactivated account] Fri 22 Jan 2016

If we use relays would that information on which relays are up be available on a site such as podupti.me as that also shows if pods are down (for what ever reason) it could help developers improve the network and identify problems

AS

Alex Stacey Tue 17 May 2016

Hi guys. I'm #newhere :smiley: and don't know much about the existing architecture of d* but I read through lots of this thread yesterday with interest, and have a couple of comments...

If I understand correctly, some proposals involve pushing public posts to pods that have users following certain tags. This seems problematic to me as it ignores past public posts. For example, if a user starts following #privacy and happens to be the first user on that pod to do so, they will only get future posts; they won't be able to look through the history of that tag. I don't think that would be the expected (or desired) behaviour.

The alternative that came to mind (which may well have been suggested already) is that each pod could publish a list of the tags that they have public posts for, and then they could be pulled in when needed. So, using the example above, when the first user starts following #privacy, the pod then (somehow) finds all of the other pods that have public posts for that tag and pulls them in. Something like that also gives more power to certain pods to decide what they want to pull in. Some pods might want to ignore #nsfw for example.

Anyway, just my thinking while reading this thread. Excuse me if I'm repeating what has already been said.