Loomio
August 17th, 2018 10:34

Heads up on social.coop server space

Nick S
Nick S Public Seen by 312

Just to report on the last server outage whilst I'm thinking about it....

This last one was resolved by @fardog :raised_hands: who happened to be awake at the right time. He discovered that it was because it was running low on disk space on the main (root) partition.

He pruned some docker image cruft, but it's still currently at 93% full.

Now I can't explain exactly why it's so full, but obviously it's something we need to do something about or our server will start dying all the time. It's not the Mastodon database, that's on another (also 64% full) disk.

(Perhaps the growth was related to the recent Mastodon influx I've been hearing about, but either way we should expect more users and more tooting...)

Sorting this out may require taking the server down for a while, I suspect.

There's also backups, I notice the database backup file has quadrupled in size since about June (2G -> 8.4G), which probably needs investigation. I say 'backup' because we currently just have a manual backup of the database, and it's only run when someone remembers to. In order to protect ourselves from various trousers-falling-down scenarios we might encounter, we need an automated back-up, ideally generational, which also means more 10s - 100s of gigabytes of (off-server) disk space.

Does anyone here know much about analysing Mastodon instances, or know someone who does?

And this touches on the issue of spending funds, which is a different issue but I'll mention here: perhaps we should allocate a budget to working groups, which they can spend at their discretion without the need to go back to the main / finance group?

For those with git.coop accounts, you can see the tickets I created on the recent outage and the disk space question here. I suggest we keep the technical discussion there as much as possible to spare those here who have been overwhelmed by Loomio chat. :) Anyone who wants an account can sign up following the instructions here https://git.coop/social.coop/

Michele Kipiel

Michele Kipiel August 17th, 2018 10:41

As already discussed in the chatroom, would it make sense to setup a staging/dev server? What would be the cost of getting one?

Nick S

Nick S August 17th, 2018 12:48

I think it would make total sense, obvs.

I checked the last budget a while back, it looks like our existing two servers cost $26/month, so another one wouldn't break the bank ATM, since IIRC we have $300/month left over. Apart from this and disk space I don't know what other expenses we expect.

Hmm... I suppose there's paying people to manage things. But this is a whole other question, and I don't think we have nearly enough to pay going rates multiplied by the hours needed, it'd need to be bounties and thank-you gifts.

Victor Matekole

Victor Matekole August 17th, 2018 14:27

Hi all,

Disk space has always been a problem, mainly due to limited budget.... However, there are a couple of things that you missed that will free up further space:

— docker system prune -a — this gets rid of everything that is superfluous, including old containers and volumes. Docker doesn't automatically remove previously ran containers or volumes, unless you say so. It is safe to do this as we mount volumes that need persisting to host, this allows you to not care about Docker preserving volumes or containers.

– NUM_DAYS=7 rake mastodon:media:remove_remote. All media is uploaded locally first, resized/optimised with Paperclip and then pushed to DreamObjects. I'm not sure if Mastodon runs the aforementioned rake task regularly, it is possible it does but from my experience there is always media available to delete when I run it, this frees up a lot of space.

Finally, looking at the disk /var/lib/docker takes up 27GB of space alone, which suggests it is this media that is taking up space. I am now running this rake task in a detached screen, please no reboots : ) until I give the all clear.

Hope this helps.

@wulee @fardog

Victor Matekole

Victor Matekole August 17th, 2018 14:31

Additionally, I just came across this —
https://github.com/tootsuite/documentation/blob/d9ecbee47d6c09afbff8cf1280e29018872936b3/Running-Mastodon/Production-guide.md#remote-media-attachment-cache-cleanup

It appears that the cache is also made up of images from other instances we are connected to, scary....

Michele Kipiel

Michele Kipiel August 17th, 2018 14:56

Then I'd say let's consider getting that extra server, especially if it can make managing things easier for you!

Nick S

Nick S August 17th, 2018 15:13

Thanks I've just added this to the ticket here to be tried and written up somewhere

Nick S

Nick S August 17th, 2018 16:46

@victormatekole , can we, or should we try to move the /var/lib/docker folder off the root partition? Is it eventually going to get too big?

Victor Matekole

Victor Matekole August 17th, 2018 16:59

@wulee good question and probably not a bad strategy as it does tend to bloat. Theoretically there should be no issues but I would never assume with Docker, let me do some digging.

That being said I am wondering if we should consider getting a root server. I run my services via Digital Ocean (mission critical) and Hetzner where I have a couple of root servers(less critical and resource heavy). Hetzner are super cheap and you get a lot for very little, service ain't too bad either and is based in Germany. With an extra $15 or so we can get a couple of terabytes and not too shabby CPU. Nursing a 100GB with a social network of a 1000+ users seems pretty tight.

GSF

Gil Scott Fitzgerald August 18th, 2018 19:27

IMO spend the money for a good experience and fewer headaches later

Mayel de Borniol

Mayel de Borniol August 18th, 2018 20:39

Not sure what you mean? The servers we have are already root servers.
And there's pros/cons to 100GB of SSD storage vs 1000GB of SATA storage.
Of course it's probably time to add storage space and/or upgrade the server (with a bigger root partition). I have no objections if you all want to switch to another provider either, though it's worth putting together a comparison table.

Victor Matekole

Victor Matekole August 18th, 2018 21:08

Sorry, "root servers" implies dedicated hardware/servers (not virtual), as far as I understood Scaleway is a cloud service? You are correct regarding SATA vs SSD. However, Hetzner will allow you mixed setups, we can have SSD for Postgres and SATA for lesser demanding parts of the stack. Either way, I am sure we'd pay less per GB than on Scaleway. But as I suggested earlier there is always a trade-off — having a dedicated server means we look after the hardware, if a disk breaks we have to call Hetzner to replace, from experience they are reasonably fast, in this case.

Nonetheless, I've always felt a 100GB was never enough for our growth rate and requirements long-term. Hetzner was just an example, as I know them but I have no bias. I just wanted to start a conversation, where growth rate, performance and cost are carefully considered.

E.g. piece of hardware:

Intel Core i7-2600
2x HDD SATA 1,5 TB
HDD1x SSD 240 GB
RAM 32GB DDR3
€45.38 / mth

Victor Matekole

Victor Matekole August 18th, 2018 21:10

Disk consumption is now 80% by the way but there is more that can be trimmed from the media cache, I think someone restarted the ruby app and thus the job I started got killed.

GSF

Gil Scott Fitzgerald August 18th, 2018 21:11

I wonder if we could just throw postgres in RAM?

Mayel de Borniol

Mayel de Borniol August 18th, 2018 21:12

As indicated in the docs 'trunk' is a dedicated server, and 'toot' is VPS:
https://git.coop/social.coop/tech/operations/wikis/infrastructure-overview

Victor Matekole

Victor Matekole August 18th, 2018 21:15

I see ... Do they support upgrades of the disk and perhaps memory?

Victor Matekole

Victor Matekole August 18th, 2018 21:19

BTW — how do I get an account to git.coop? Just tried to register under my email address but was denied.

Fabián Heredia Montiel

Fabián Heredia Montiel August 18th, 2018 21:33

Hi @victormatekole, check out this guide on the steps to get your git.coop account: https://git.coop/social.coop/general/wikis/getting-an-account

Nick S

Nick S August 18th, 2018 23:14

I think one of our milestones should be the capability (duplicated amongst several people) to rebuild the server in the event it dies or gets hacked.

In order to learn how to do this, we need a server (or servers) to practice on.

I'd call this a "staging server".

Nick S

Nick S August 18th, 2018 23:21

Wasn't me, honest!

In general I aim to go to the riot.im channels to check if anything's going on on our servers, or to announce it on the public channel if I'm there doing something. I suggest this'd be a good policy for everyone to follow, to help avoid tripping each other up by mistake.

Nick S

Nick S August 19th, 2018 09:27

Also, I should add, if this was running in a docker container, I have been noticing a lot of 'dying and restarting' events when browsing the datadog account Mayel (I think) set up to monitor our servers. (Maybe it was you originally, however you did say it was unused and should be removed, and it seems to be a new free account).

If you have any experience interpreting these, I'd be interested what you think...

And anyone else on the tech team who's interested, go and have a look, it's quite impressive. I can either paste the credentials into the tech group's private channel, or maybe I'll get time to get keryringer set up.

Victor Matekole

Victor Matekole August 19th, 2018 15:23

Glad you are finding Datadog useful, it is pretty amazing tool! I thought it should be killed as I understood they were removing their free option or at least limiting it to 30 days... I may have got that wrong, last time I checked I could not gain access with my current credentials for social.coop. If you send me the credentials I'd be happy to give my 2 cents...

Antoine-Frédéric Raquin

Antoine-Frédéric Raquin August 20th, 2018 23:20

Hello, as Ubuntu 16.04 is the recommended platform for Mastodon, why don't we just dogfood a Mastodon snap? I proposed to document it and I got no feedback, so I gave up.

Ian Smith

Ian Smith August 20th, 2018 23:49

Social.coop returning 502 bad gateway. @victormatekole @wulee

GSF

Gil Scott Fitzgerald August 21st, 2018 00:32

I'd be willing to assist but I can't promise any amount of time.

Antoine-Frédéric Raquin

Antoine-Frédéric Raquin August 21st, 2018 00:38

Me too, but I CAN promise that I won't be available between wednesday and saturday.

Do you want to chat on XMPP? Would you tell me your time zone? I live in France (it's 20 to 3am here).

Antoine-Frédéric Raquin

Antoine-Frédéric Raquin August 21st, 2018 00:41

I don't know much about docker, but maybe getting rid of it could help about the database. (Probably not but who knows)

And I really need to pay my cotisation ASAP tbh

Unless the Mastodon backend isn't reliable enough for this?

GSF

Gil Scott Fitzgerald August 21st, 2018 00:43

I will spin up an XMPP box - I'm UTC-6 but I'm sure we can figure something out. I'm busy until next week though.

Chris Croome (Webarchitects Co-operative)

Chris Croome (Webarchitects Co-operative) August 21st, 2018 09:50

Hi, one option you could consider for hosting is buying your own hardware, if you can raise the capital, you could get a 1U server with a lot of RAM and SSDs and HDDs which could run everything (assuming you run a hypervisor on it and multiple virtual servers) and have space for development servers and backups (though you would probably also want backups elsewhere) and then colocate it with a hosting co-operative. Most new servers come with a three warranty — it would make sense to budget for renewing it after 3 years, however at that point the old machine could be used as a backup as, in my experience, servers can generally be run for about ten years.

Nick S

Nick S August 21st, 2018 09:55

Thanks. As I mentioned in the chat channel, it seems to have resolved itself...

There've been a bunch of outages like this, in which there's a 502 or similar, and a pingometer/pingdom notification, which mysteriously resolves itself. I'm a bit of a newbie with docker, but it looks like one of the containers will die and then restart. I'd like to know why this happens, I'm still researching that. Maybe @mayel or @victormatekole or one of the other admins will be able to shed some light on that, but at least it isn't currently a critical problem (and I don't think it's a disk related problem).

Nick S

Nick S August 21st, 2018 09:58

Timezones: I think we have admins who can fix server issues in the EU and US timezones (assuming they're not indisposed for some reason). Do we have anyone in the Asian timezones in between who could do this?

Nick S

Nick S August 21st, 2018 10:12

Hi @proton - Was the suggestion for XMPP chat because it's preferable somehow to the matrix chat server we already use, or simply because you weren't aware of it?

https://riot.im/app/#/room/#SocialCoop:matrix.org

If the latter perhaps we should mention it somewhere more prominent, although I'd need to go and check where it's missing from. (See also my mention of a getting-started document in another thread... might make sense if we try and keep all this info in one place as much as possible, but also I'm aware that the order of access to the git.coop server and this loomio group might not always be the same.)

Nick S

Nick S August 21st, 2018 10:14

Also @proton - regarding snaps, we don't currently use them, we use Docker. I don't think it'd be a trivial thing to change that at this point, but you could certainly try and make a case for doing so when we get the opportunity. You'd need to explain to those of us not familiar with snaps why they're better than the alternatives.

Antoine-Frédéric Raquin

Antoine-Frédéric Raquin August 21st, 2018 10:32

I was suggesting XMPP because :

1) I was not aware of the Matrix chat ;

2) there's no reason to discuss about snapcraft in a Matrix chatroom, as it would mainly flood the chatrooms already here, IMO. I prefer to have 5 private conversations on 5 different sync channels (that anyone is free to join) than to have 5 conversations in the same Matrix chatroom, for purposes of clarity ;

3) I'm not sure that Riot runs on my computer.

Nick S

Nick S August 21st, 2018 10:49

The riot.im client (see my link above) should run in all major web browsers, but if not please let us know because we don't want to pick tech which isn't fairly universally accessible for our public channel.

However, if you want a private discussion with a specific person, you're right, of course you can pick any tech you agree on.

And a snapcraft discussion could just be on a new thread here in the Loomio Tech WG group, I didn't mean to imply you needed to discuss that in our public chat room.

Antoine-Frédéric Raquin

Antoine-Frédéric Raquin August 21st, 2018 11:30

It doesn't run on a 2000s Intel CPU without overheating the CPU, whereas xmpp-client, Gajim, Dino… run without much problems on i3 (obviously not on Windows though). I'm not on my own computer (whose motherboard melted) so it's fine, but this exact trend of building software over Electron or in a browser just because it works over all platforms isn't really on point on ecology and accessibility.

It's also not that much accessible because the Riot client has a terrible UX, and I just think that making it the de-facto standard for free software, cooperative, or union work is a bad idea.

I'd simply recommend rocket.chat instead of Matrix (if not XMPP, but that's just because the clients and servers are objectively better regardless of the protocol itself); I don't know if officially using Matrix doesn't push people out of the decision process and I'd like to question that.

Victor Matekole

Victor Matekole August 24th, 2018 06:59

Never heard of Snap till now... However, here is current status of Snap package for Mastodon — https://github.com/tootsuite/mastodon/issues/1068... Appears non-existent at the moment.

Victor Matekole

Victor Matekole August 24th, 2018 07:02

I like the idea of owning bare metal! Some cost-benefit analysis would have to be performed but I suspect it would be cheaper in the long-run as the network grows in numbers.

Victor Matekole

Victor Matekole August 24th, 2018 07:10

When I have chance to look at Datadog I will check to see what maybe the root cause. When I look at mem. consumption there is only 200mb free, I wonder if we are hitting some memory limits, which is common with Rails apps as they tend to be resource heavy and leak memory especially from poorly written 3rd-party packages.

Nick S

Nick S August 24th, 2018 07:42

I was trying to get the memory/CPU load overlayed with docker events, to see if they correlate. I think I managed it and concluded that the memory grows and then gets resets when there's an event, but this is across the whole system, and yet doesn't imply that memory causes the events rather than vice versa.