Fri 17 Aug 2018 10:34AM

Heads up on social.coop server space

NS Nick S Public Seen by 314

Just to report on the last server outage whilst I'm thinking about it....

This last one was resolved by @fardog :raised_hands: who happened to be awake at the right time. He discovered that it was because it was running low on disk space on the main (root) partition.

He pruned some docker image cruft, but it's still currently at 93% full.

Now I can't explain exactly why it's so full, but obviously it's something we need to do something about or our server will start dying all the time. It's not the Mastodon database, that's on another (also 64% full) disk.

(Perhaps the growth was related to the recent Mastodon influx I've been hearing about, but either way we should expect more users and more tooting...)

Sorting this out may require taking the server down for a while, I suspect.

There's also backups, I notice the database backup file has quadrupled in size since about June (2G -> 8.4G), which probably needs investigation. I say 'backup' because we currently just have a manual backup of the database, and it's only run when someone remembers to. In order to protect ourselves from various trousers-falling-down scenarios we might encounter, we need an automated back-up, ideally generational, which also means more 10s - 100s of gigabytes of (off-server) disk space.

Does anyone here know much about analysing Mastodon instances, or know someone who does?

And this touches on the issue of spending funds, which is a different issue but I'll mention here: perhaps we should allocate a budget to working groups, which they can spend at their discretion without the need to go back to the main / finance group?

For those with git.coop accounts, you can see the tickets I created on the recent outage and the disk space question here. I suggest we keep the technical discussion there as much as possible to spare those here who have been overwhelmed by Loomio chat. :) Anyone who wants an account can sign up following the instructions here https://git.coop/social.coop/


Victor Matekole Sat 18 Aug 2018 9:08PM

Sorry, "root servers" implies dedicated hardware/servers (not virtual), as far as I understood Scaleway is a cloud service? You are correct regarding SATA vs SSD. However, Hetzner will allow you mixed setups, we can have SSD for Postgres and SATA for lesser demanding parts of the stack. Either way, I am sure we'd pay less per GB than on Scaleway. But as I suggested earlier there is always a trade-off — having a dedicated server means we look after the hardware, if a disk breaks we have to call Hetzner to replace, from experience they are reasonably fast, in this case.

Nonetheless, I've always felt a 100GB was never enough for our growth rate and requirements long-term. Hetzner was just an example, as I know them but I have no bias. I just wanted to start a conversation, where growth rate, performance and cost are carefully considered.

E.g. piece of hardware:

Intel Core i7-2600
2x HDD SATA 1,5 TB
HDD1x SSD 240 GB
€45.38 / mth


Gil Scott Fitzgerald Sat 18 Aug 2018 9:11PM

I wonder if we could just throw postgres in RAM?


Mayel de Borniol Sat 18 Aug 2018 9:12PM

As indicated in the docs 'trunk' is a dedicated server, and 'toot' is VPS:


Victor Matekole Sat 18 Aug 2018 9:15PM

I see ... Do they support upgrades of the disk and perhaps memory?


Victor Matekole Sat 18 Aug 2018 9:19PM

BTW — how do I get an account to git.coop? Just tried to register under my email address but was denied.


Fabián Heredia Montiel Sat 18 Aug 2018 9:33PM

Hi @victormatekole, check out this guide on the steps to get your git.coop account: https://git.coop/social.coop/general/wikis/getting-an-account


Nick S Sat 18 Aug 2018 11:14PM

I think one of our milestones should be the capability (duplicated amongst several people) to rebuild the server in the event it dies or gets hacked.

In order to learn how to do this, we need a server (or servers) to practice on.

I'd call this a "staging server".


Gil Scott Fitzgerald Sat 18 Aug 2018 7:27PM

IMO spend the money for a good experience and fewer headaches later


Victor Matekole Sat 18 Aug 2018 9:10PM

Disk consumption is now 80% by the way but there is more that can be trimmed from the media cache, I think someone restarted the ruby app and thus the job I started got killed.


Nick S Sat 18 Aug 2018 11:21PM

Wasn't me, honest!

In general I aim to go to the riot.im channels to check if anything's going on on our servers, or to announce it on the public channel if I'm there doing something. I suggest this'd be a good policy for everyone to follow, to help avoid tripping each other up by mistake.

Load More