Loomio
Wed 6 Jun 2018

social.coop down (June 6, 2018)

C(
Clayton (clayton@social.coop) Public Seen by 426

This is probably apparent to others, but wanted to start a thread about social.coop being down. This can be a space for admins to post updates, if that's helpful.

[Edit]
First point of rendezvous for those on the tech team investigating these issues is our matrix chat room. You may want to visit there first, as this thread is less likely to be updated.

S

spudboy Wed 6 Jun 2018

What's up with the Gitlab? We should really use its issue tracker

NS

Nick S Wed 6 Jun 2018

Head on over to vote to get one in the Tech WG proposal here!

KS

Karl Schultheisz Wed 6 Jun 2018

For what it's worth, we're not the only instance that's down. Even https://mastodon.social/ appears down to me.

NS

Nick S Wed 6 Jun 2018

Seems up for me?

MDB

Mayel de Borniol Thu 7 Jun 2018

What admins?...

For what it's worth @victormatekole and I intervened on the server late last night and the instance seemed to work fine afterwards. Have people had issues in the last hours?

S

spudboy Fri 8 Jun 2018

anyone still having a problem, you need to shift-reload to clear the cache, i just figured it out today lol

ES

Ed Summers Fri 29 Jun 2018

I'm guessing everyone can see that social.coop is completely down?

VM

Victor Matekole Fri 29 Jun 2018

@edsummers looking into it now ... Appears a Scaleway issue, for now, as I cannot terminal into the servers.

VM

Victor Matekole Fri 29 Jun 2018

Status from Scaleway, they have routing problems. All we can do sit tight till they can resolve — https://status.online.net/index.php?do=details&task_id=1300

ES

Ed Summers Fri 29 Jun 2018

It looks like we are back online, but for some reason image uploads seem to be failing for me.

GSF

Gil Scott Fitzgerald Sat 11 Aug 2018

Seeing a 502 Bad Gateway error at 2103 central time

MN

Matt Noyes Sat 11 Aug 2018

Still down.

C(

Clayton (clayton@social.coop) Sat 11 Aug 2018

Any update on this most recent outage?

NS

Nick S Sat 11 Aug 2018

Looks like it's back up? Did anyone do something to make that happen?

As a general policy, I suggest reporters and fixers rendezvous here to coordinate:

https://riot.im/app/#/room/#SocialCoop:matrix.org

I've created an issue ticket on our GitLab instance to remind me to document this, and I'll add something at the top of this and the issues thread to point people at it.

https://git.coop/social.coop/tech/operations/issues/12

LS

Leo Sammallahti Fri 31 Aug 2018

Is social coop down for others?

NZ

Ned Zimmerman Fri 31 Aug 2018

Yep.

MN

Matt Noyes Fri 31 Aug 2018

Yep. Any ideas?

NS

Nick S Fri 31 Aug 2018

Database/mastodon had twisted knickers, I supplied a fresh pair. Looks like it's back to work.

JD

Josef Davies-Coates Thu 6 Sep 2018

social.coop is down for me right now... is that intentional? if so, why? thanks.

NS

Nick S Thu 6 Sep 2018

It's down. Post upgrade teething troubles, victor was looking at it but the clock ran out and he will have to revisit it later.

I've updated the thread title to point everyone at our channel on riot.im, where you're likely to get more up-to-date information. At some point in the future we should have a status page.

N

Noah Thu 6 Sep 2018

we have one but it's not regularly updated, and when i spent a few minutes looking into it, it wasn't immediately clear how one might go about updating it. anyway it's at https://status.social.coop - slightly more info in the git.coop infrastructure doc in Section B, part 3.

NS

Nick S Thu 6 Sep 2018

Yes... I didn't count that as it has to be updated manually (how I don't know), and I don't think it runs on our servers, so it is subject to termination.

JD

Josef Davies-Coates Thu 6 Sep 2018

OK thanks @wulee

N

Noah Thu 6 Sep 2018

Seems like lately (maybe post-upgrade?) there's pretty regularly downtime in the morning here (EDT).

NS

Nick S Mon 8 Oct 2018

Hi all. We (@nicksellen and I) are now about to switch social.coop's media object storage provider over. (Ticket on git.coop, for those with an account there, is https://git.coop/social.coop/tech/operations/issues/21)

So the site will be offline for a very short time. We hope you won't notice any missing images when we come back, but if you do, this is why. Note: we've made sure all the social.coop images are safe, it should only be cached remote media which may require restoring.

If you have problems please contact us on our chat channel

https://riot.im/app/#/room/#SocialCoop:matrix.org.

Thanks!

NS

Nick S Mon 8 Oct 2018

Ok, this is essentially done. Some media files from remote sites will appear to be missing because Mastodon still thinks they're cached in our content storage but they're not. However think we can clear the cache with some magic Masto incantations, or failing that, database hacking.

Meanwhile, we're also thinking about the next step, which is to migrate our instance to our new server. If we get the chops to do that today, we may defer monkeying with the cache until after that, because clearing the cache could take a while to run based on previous experiments.

NS

Nick Sellen Mon 8 Oct 2018

Yup, we will attempt the server migration later tonight too, in about an hour. It's good to do it whilst we have our newly found mastodon-fu loaded into our brains.

After we're happy with the stability of the new deployment we can spend a bit more time on documentation and communication of what has been happened on the technical infrastructure front.

NS

Nick Sellen Tue 9 Oct 2018

The migration is complete! Let us know if you see anything a bit wonky still.

There's a bunch more work to do tidying up, etc. sort out proper backups. Tasks are listed at https://git.coop/social.coop/tech/operations/issues in some kind of order. We'll probably head to sleep first before putting all that in order.

BH

Bob Haugen Thu 11 Oct 2018

Had to clear cache to get it to work again. But now it seems back up and running. THanks again for all your hard work!

MN

Matt Noyes Mon 8 Oct 2018

Thanks for your work!

M

mike_hales Tue 9 Oct 2018

Kudos :astonished: Please . . identify the all ops volunteer workers here, so we non-gitters can see some of the hidden reality that makes the lights come on when we flick the switch? @wulee @nicksellen AN Others? Thank u :clap:

Just out of anthropologist-interest . . where literally is the new server (s? backup?). And the object storage? Ultimately, I believe it's important to understand the materiality and geography and indebtedness of this magic infrastructure stuff. Like massive Amazon S3 server farms, transatlantic cable, etc.

NS

Nick Sellen Tue 9 Oct 2018

The server is in Helsinki. It's a Hetzner server.

You can also find this information for any website:
- from social.coop domain --> command ping social.coop --> ip address is 95.216.13.24
- geoip (e.g. with https://www.maxmind.com/en/geoip-demo) ---> says Finland, plus some approx co-ordinates --> google maps search: 60.1708, 24.9375 --> Helskini!

The object storage is where all the user uploaded files get saved:
- user avatars and header images
- media attachments (mostly uploaded images)
- link preview images
- temporarily storage of imports (if you do a bulk import)

By default mastodon would put them on the local filesystem on the server, but it's handy to put them on a remote storage service, so you can move the server without moving the files. We use DigitalOcean Spaces now (was previously Dreamhost DreamObjects). We chose the Amsterdam location.

It was just co-incidence that we had to move the object storage at the same time as the server - Dreamhost were shutting down the east coast USA service and required all the users of the service to move it themselves to their west coast USA service, or entirely migrate to another platform (which we opted for, as they were stored under Victors companies account).

People involved were: mostly @wulee, he really took responsibility for the task. I have been taking a supporting role, boosting morale, someone to bounce ideas off, to help investigating issues that come up, sharing some of the tasks. @victormatekole helped with the object storage migration, and @mayel supported us by extending the migration deadline and sharing access to the things we needed.

BH

Bob Haugen Tue 9 Oct 2018

My quote of the day:

it's important to understand the materiality and geography and indebtedness of this magic infrastructure stuff!

DVN

Dave V. ND9JR Mon 15 Oct 2018

I'm still getting a "502 - bad gateway" message when I try to use social.coop via the web and have been for over a week now. Using it via Mastalab on my Android tablet seems to work.

BH

Bob Haugen Mon 15 Oct 2018

Working fine for me in Firefox but I had to clear my cache

DVN

Dave V. ND9JR Mon 15 Oct 2018

It works fine in Firefox Mobile on my tablet. However in SeaMonkey on my desktop I still get the 502 error even after cleaning my cache (and turning off proxies). I'm using SeaMonkey version 2.49.3.