Language filter for diaspora - as a gsoc project

PP Pirate Praveen Public Seen by 115

We should detect the language of a post (also give an option for user to manually specify) and allow everyone to filter content based on languages they know.

We suggested it as a google summer of code idea and one student is interested in this. We have one mentor who knows ror but if someone from the community can also support this, it would be awesome.

The discussions happened so far http://lists.smc.org.in/pipermail/student-projects-smc.org.in/2014-March/000076.html


Jonne Haß Thu 13 Mar 2014 10:35AM

I won't find time to mentor anything this year, but of course I'm available for questions in #diaspora-dev @ Freenode, usually in the evening hours CET.


Karthik Senthil Thu 13 Mar 2014 12:28PM

Nice to know that you are ready to help me for this GSoC project.

I have gone through the schema of Diaspora project and feel that this feature of language preference for a user can be added in the user_preferences model.
Also for tagging a post, the acts-as-taggable-on gem can be used(for filtering posts too).
For translation I plan on using the globalize gem.
I have sent a detailed version of this idea in the form of a mail in the above mentioned mailing list.
Kindly review this idea as I would like to know if I'm on the right track for this feature.


Karthik Senthil Thu 13 Mar 2014 6:31PM

A brief list of action points to implement Language filter for Diaspora:

1)Add the new column called languages_preferred to the users table instead of user_preferences.
2)Tag a post using the add-as-taggable-on feature with the language it is written in by detecting it through a gem that has local detection and does not depend on any external services.
3)At the receiver's side, filter the incoming post by looking up his/her language preferences. This also ensures that there is no breach in security or the protocol used to federate the posts.
4)If necessary translation of posts(or comments) can also be done using the globalize gem.
5)A UI has to be integrated for every user to be able to add/edit his/her language preferences.


Jonne Haß Thu 13 Mar 2014 7:05PM

Sounds good, except I'm not sure where you're going at with globalize. Translation of posts would be an independent feature to me that needs to be discussed separately (and I don't think it's feasable/needed).


Karthik Senthil Fri 14 Mar 2014 12:20PM

The translation part of a post or comment is just an idea... i guess it should be thought about and discussed after adding the language filter module completely.


Pirate Praveen Fri 14 Mar 2014 5:10PM

I think translation of posts and comments is an important feature. It is already implemented in loomio. We can have an option for automatic translation or manual translation.

Also we need to integrate jquery.ime to input non-latin languages later.


goob Fri 14 Mar 2014 5:20PM

Both of these (especially translation) sound to me as though they would be quite heavy on server load.


Karthik Senthil Sun 16 Mar 2014 10:51AM

I have verified that the gems(act-as-taggable-on, whatlanguage) are used in local environment and do not have any external dependencies. Further, the language detection is done only on the receiving pod.
May I know which feature(under your opinion) will increase the server load ?
Thank you for your time.


abhineet agarwal Thu 20 Mar 2014 9:52AM


I and Praveen are having some discussions about my proposal for this project .Praveen wants

"Also having an option for manual translations would be good too
(especially since there is a limitation in automatic translation and
also many Indian languages are missing from automatic translations).
Users should be able to request translation of a post and others
should be able to translate the posts. It would be good to have a
"Requested translations stream" for each user and each language (for
those who volunteer to translate)."

To which my reply was:

"The option for manual translations, and an interface to manually translate the posts would be trivial to implement.
Use "globalize" gem to have separate translation tables for user generated content along with the regular UI constants and UI Messages. And, push the untranslated user-generated content into a separate to-be-translated stream, and the translations are requested from the volunteer users. I can include a more detailed technical overview of the feature, and some UI mockups in my application."

Please give me any valuable feedback regarding the same.
Thank you.


abhineet agarwal Thu 20 Mar 2014 9:56AM

Following is the link to my current proposal :-


I am making the changes in proposal accordingly.
Any suggestions or feedback regarding any other feature will be of great help too.


Maciek Łoziński Thu 20 Mar 2014 10:20AM

Maybe language detection (and translation?) could be done client-side? Are there any JavaScript libraries out there? Or maybe some sort of third-party online service could be used?


Jonne Haß Thu 20 Mar 2014 12:43PM

I still heavily oppose mixing these two together. Language detection and filtering is an already discussed topic and I sense a global consensus for it.

However post translation is a completely independent feature, that has very high implications on user experience and the daily usage of diaspora. On a technical level it also has quite a huge impact on the federation protocol.

It just makes no sense at all to mix these together and I like to see much more throughout discussion in our community about post translation before I see time spent on implementation details of how to do it justified.


Karthik Senthil Thu 20 Mar 2014 2:21PM

Due to the disadvantages of translation_tables in globalize, I have decided to replace the globalize_gem with other language translation gems(which
rather use APIs like Google or Bing). I have hereby listed out a few of the gems that I have explored:

1) to_lang ( https://github.com/jimmycuadra/to_lang )
2) easy_translate ( https://github.com/seejohnrun/easy_translate )
3) language-translator

I am sure that apart from these gems, external API calls can also be made for the same.

There are many client side language translators as well supported by jQuery (using Google API). However these might be 3rd party softwares and can cause concerns related to security or robustness of Diaspora.

I am not completely sure as to if these would relieve the server from load, of course translation can be kept as a secondary idea for implementation after implementing the language detector feature and testing the same rigorously. Any feedback for this suggestion ?


Florian Staudacher Thu 20 Mar 2014 7:46PM

I agree with what @jonnehass said.
As the initial step the focus should be on detecting the language. That is not a small feature by itself.

Meanwhile we can keep thinking about a way to implement translation that is practical, but also aligns with security and privacy concerns that form the base of what Diaspora stands for.

Those are two separate features and should not be munged into one task.


Karthik Senthil Thu 20 Mar 2014 8:03PM

Yes, the initial focus of this GSoC project will be the implementation of language detection and tagging relevant posts(as planned in the previous comments). However there will be a parallel discussion on how to integrate the translation feature as well.


Ryuno-Ki Wed 26 Mar 2014 9:49AM

I'm against a automatic translation through a third party API.
If I'm interested, I can always paste the content into translate.google.com on myself without sending data to it. I dislike the exposing …


Globulle Mon 26 Jan 2015 8:42PM

Hi, is there any news on this project ? I feel like it would be very useful to be able to filter posts that are not comprehensible to the user.


Pierre-Yves Tue 7 Apr 2015 7:13PM

How about allowing user to specify known languages in their profile so at least it could be used to exclude from the streams the posts of the other users who don't have any language in common.

For example if I choose english and french and someone else has defined english and italian, I might still see posts in intalian from that person if we use the same hastag, but at least I wouldn't see posts from somebody who has defined dutch, german and spanish as favorite language.

This is far from being perfect (unless you only select one language), but still it would improve the streams without using a third party API (like google translate) which is a problem for some people and I suppose it shouldn't be a very complex development ?


Mikaela Suomalainen Sat 23 Apr 2016 6:35AM

I would like to have option to filter languages other than Finnish and English as I don't understand other languages.

Currently I can unaspect people who mainly write in languages I don't understand, but there are still followed hashtags which aren't restricted to one specific language as either the same word exists in both languages or everyone just uses the word instead of whatever the word for their language is.