It is currently Tue May 23, 2017 5:45 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next
Author Message
Offline
 Post subject: We're Back! Extended Downtime Postmortem and Future Plans
Post #1 Posted: Sat Feb 18, 2017 4:38 pm 
Lives with ko
User avatar

Posts: 194
Location: Toronto, Ontario (Canada)
Liked others: 60
Was liked: 103
Rank: AGA 1k
GD Posts: 1190
Universal go server handle: apetresc
Hey everyone! As you've all certainly noticed by now, Life in 19x19 was down for about 11 days. The good news is that we're now back and (hopefully) better than ever!


What Happened

Without going into too much technical detail, there was some sort of quota that was hit on the database server hosted by GoDaddy. Jordus (the admin who owned the actual hosting account) was having trouble getting timely action from GoDaddy, so not much progress was being made. After about a week, Brian Kirby and I offered to move everything to my AWS account, where we would have complete access to both the hardware and software, instead of being at the mercy of GoDaddy's tech support. At first we were scraping together old backups from Kirby's hard drive and what was left of the FTP server (which were over a year old! :( ), but eventually Jordus saved the day and gave us a raw dump of the database just prior to the outage. After some scripting, I had the full site up on an EC2 instance (for phpBB/Apache) and RDS instance (for MySQL).

The Future
Obviously the length of this downtime re-emphasized the need to reduce our bus factor. There were a couple of us with admin access to the board, but only two that could directly connect to the database, only one that controlled the domain, and 0 of us that had root on the physical hardware everything ran on (thanks GoDaddy). Going forward, we're going to be:
  • Posting the phpBB source, with all our theming and modifications, to a public GitHub organization, so anyone can clone it and help develop plugins and themes.
  • Scheduling automated daily backups of the DB. (This is actually already done as of today)
  • Make a sanitized (i.e, PMs and password hashes removed, etc) copy of this daily backup publicly available, as Linus Torvalds himself recommends ;)
  • Get a few more technically-savvy admins access to the AWS account this is running on. More on this soon.
  • We're running on a downright ancient version of phpBB (3.0.8, from November 2010 !!). Once everything's calmed down, I've verified the backup procedures work, and the versioning to GitHub is complete, I'm going to run an upgrade to the latest phpBB 3.2. Security and performance are the main benefits. This should be complete by end of day Monday February 20th.

Are we missing anything? Are there any other points of failure the community wants to see plugged?

Known Issues
Pretty much every feature that existed before the downtime has been restored. The below are the issues that I'm aware of, but decided weren't urgent enough to delay launch any further:
  • The database backup we were operating on had some character encoding issues; as such, you may notice some UTF-8 characters (e.g, accented names like Törmänen, or CJK ones like 古力) in usernames and topics be malformed. If that is the case, please contact an admin/mod and we will take care of it. I've fixed a couple of the glaring ones already, but I'm sure some have escaped me. NOTE: This should not affect actual post text, since that was binary-encoded.
  • The search index is being rebuilt overnight. Until then, search terms won't work for any topics older than today's.
  • For the next ~48 hours, your DNS settings may flip back and forth, leading you back to the old site (i.e, the error page). This will just solve itself within a day or two, as the new address reaches all the corners of the world.
If you notice any other problems not on this list, please reply to this thread with it!


Good luck and have fun, everyone :D
-Adrian

_________________
The road to wisdom? Well, it's plain, and simple to express: Err, and err, and err again; but less, and less, and less!
Image Image Image Image


This post by apetresc was liked by 24 people: Bill Spight, Bonobo, Calvin Clark, daal, Darrell, dfan, ez4u, fireproof, gamesorry, GrB, gustav, HermanHiddema, hyperpape, jeromie, joellercoaster, Kirby, Koosh, Nyanjilla, Rémi, schultz, Solomon, thombreSoft, Waylon, yoyoma
Top
 Profile  
 
Online
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #2 Posted: Sat Feb 18, 2017 8:34 pm 
Lives in gote

Posts: 608
Location: Littleton, CO
Liked others: 203
Was liked: 178
Rank: KGS 4k
Universal go server handle: jeromie
Glad the site is back. There are both content and members here that would be sorely missed if the go community were to lose them.

I'm moderately tech-savvy; let me know if there's anything I can do to help.


This post by jeromie was liked by: goTony
Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #3 Posted: Sat Feb 18, 2017 9:15 pm 
Judan

Posts: 7327
Liked others: 1325
Was liked: 1113
KGS: Kirby
Tygem: 커비라고해
I'd like to extend thanks to Adrian, once again, who offered to host us on his AWS instance, and who also set things up with a pretty quick turnaround.

He received the database backup yesterday, and we are up and running today!

With multiple people having access to the AWS account, along with a daily backup of the database publicly available, we should be able to avoid the problem we had this time around - we won't be bottle-necked in fixing the site if an unexpected area fails.

_________________
Discipline is remembering what you want. -David Campbell


This post by Kirby was liked by 2 people: Bonobo, schultz
Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #4 Posted: Sun Feb 19, 2017 2:05 am 
Lives in gote

Posts: 417
Location: Vienna, Austria
Liked others: 194
Was liked: 191
Thank you for resolving this issue!

There seems to be a locale problem - hopefully this didn't corrupt rhe DB import:

When logged in, my surname "Grünauer", shows up as "Grünauer" in the upper left and also in the username for this post.

_________________
http://badukspace.com - An explorer space for data related to this game

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #5 Posted: Sun Feb 19, 2017 2:43 am 
Dies with sente

Posts: 120
Liked others: 10
Was liked: 27
Thank you all for resolving this!

I've got one point to add to the list:
Hopefully there won't be a next time, but I d appreciate it in case of long downtime if there was, after some delay of course, a more informative error message saying "give us a week or two" or pointing to senseis or reddit where there was some status update.

_________________
If something sank it might be a treasure. And 2kyu advice is not necessarily Dan repertoire..

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #6 Posted: Sun Feb 19, 2017 6:02 am 
Gosei
User avatar

Posts: 1938
Location: Tokyo, Japan
Liked others: 1645
Was liked: 1062
Rank: Jp 6 dan
KGS: ez4u
Many, many, many thanks to Adrian and Brian (Apetresc and Kirby)!!!
:clap: :clap: :clap: :clap: :clap: :clap:

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #7 Posted: Sun Feb 19, 2017 6:37 am 
Lives with ko
User avatar

Posts: 194
Location: Toronto, Ontario (Canada)
Liked others: 60
Was liked: 103
Rank: AGA 1k
GD Posts: 1190
Universal go server handle: apetresc
Marcel Grünauer wrote:
When logged in, my surname "Grünauer", shows up as "Grünauer" in the upper left and also in the username for this post.

Yup, that's the sort of encoding problem I was referring to in the "Known Issues" part. Thanks for pointing it out, I've fixed it now :)

bayu wrote:
Hopefully there won't be a next time, but I d appreciate it in case of long downtime if there was, after some delay of course, a more informative error message saying "give us a week or two" or pointing to senseis or reddit where there was some status update.

Yeah, for sure. Sometimes it's not possible to put an error message on the lifein19x19.com URL itself, depending on the nature of what the problem is, but at the very least Reddit/SL.

_________________
The road to wisdom? Well, it's plain, and simple to express: Err, and err, and err again; but less, and less, and less!
Image Image Image Image


This post by apetresc was liked by 2 people: Bonobo, Marcel Grünauer
Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #8 Posted: Sun Feb 19, 2017 8:28 am 
Dies with sente

Posts: 76
Liked others: 13
Was liked: 16
Thanks a lot, keep up your outstanding work!

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #9 Posted: Sun Feb 19, 2017 9:01 am 
Lives with ko
User avatar

Posts: 228
Location: London
Liked others: 253
Was liked: 63
Rank: OGS 4k
OGS: Joellercoaster
*dances*

_________________
Confucius in the Analects says "even playing go is better than eating chips in front of tv all day." -- kivi

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #10 Posted: Sun Feb 19, 2017 4:53 pm 
Lives with ko
User avatar

Posts: 194
Location: Toronto, Ontario (Canada)
Liked others: 60
Was liked: 103
Rank: AGA 1k
GD Posts: 1190
Universal go server handle: apetresc
Progress Report

  • Fixed a serious bug with the [sgf] tag when referencing an attachment rather than an inline SGF. The links had been generated, at that time, with the /forum path, but we no longer use that. So I added a rewrite rule, and they all work now.
  • We have a GitHub organization, and the forum code is up there.
  • The search index has been rebuilt, so your searches should now work!

_________________
The road to wisdom? Well, it's plain, and simple to express: Err, and err, and err again; but less, and less, and less!
Image Image Image Image


This post by apetresc was liked by 4 people: Bonobo, jeromie, Kirby, schultz
Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #11 Posted: Sun Feb 19, 2017 5:53 pm 
Gosei
User avatar

Posts: 1979
Location: Germany
Liked others: 7117
Was liked: 805
Rank: OGS + EGF DDK
OGS: trohde
Wow, SO happy to be here again.

HUGE THANKS to everybody involved

I had reloaded my “view unread posts” tab almost every hour, only to get “The host does not exist.”, but just a minute ago Schachus wrote on the German DGoB forum that the URL without the WWW, i.e. http://lifein19x19.com/, DOES work, while the URL with WWW doesn't.

So:
http://www.lifein19x19.com/ BAD
http://lifein19x19.com/ GOOD
Would be nice to have this resolved, too.

• Also, could we get https?

• And about the “Donate” button … does it point to the correct PayPal acct already?


Thanks folks, you’re cool!

_________________
“Whenever you find yourself on the side of the majority, it is time to pause and reflect.” — Mark Twain ★ Come and play on OGS


This post by Bonobo was liked by 2 people: apetresc, Kirby
Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #12 Posted: Sun Feb 19, 2017 6:10 pm 
Lives with ko
User avatar

Posts: 194
Location: Toronto, Ontario (Canada)
Liked others: 60
Was liked: 103
Rank: AGA 1k
GD Posts: 1190
Universal go server handle: apetresc
Bonobo wrote:
I had reloaded my “view unread posts” tab almost every hour, only to get “The host does not exist.”, but just a minute ago Schachus wrote on the German DGoB forum that the URL without the WWW, i.e. http://lifein19x19.com/, DOES work, while the URL with WWW doesn't.

So:
http://www.lifein19x19.com/ BAD
http://lifein19x19.com/ GOOD
Would be nice to have this resolved, too.

Good catch. I've just added the DNS entry and VirtualHost for www.lifein19x19.com too, it should start working in a few hours, again as DNS propagates. Thanks! :)

Bonobo wrote:
• Also, could we get https?

Yes! Now that Let's Encrypt is giving out free certificates, that's a possibility. I'll add that to the roadmap.

Bonobo wrote:
• And about the “Donate” button … does it point to the correct PayPal acct already?

Nope, haven't sorted that part out at all yet. I guess I could remove the button for now.

_________________
The road to wisdom? Well, it's plain, and simple to express: Err, and err, and err again; but less, and less, and less!
Image Image Image Image


This post by apetresc was liked by: Bonobo
Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #13 Posted: Sun Feb 19, 2017 6:27 pm 
Gosei
User avatar

Posts: 1979
Location: Germany
Liked others: 7117
Was liked: 805
Rank: OGS + EGF DDK
OGS: trohde
apetresc wrote:
[..]

I've just added the DNS entry and VirtualHost for http://www. [..]
Awesome :)

Quote:
Quote:
• [..] https?
[..] I'll add that to the roadmap.
:-)

Quote:
Quote:
• [..] “Donate” button [..]?
[..] I guess I could remove the button for now.
NOOOOOO, totally inacceptable, a new and valid button, please :twisted:

_________________
“Whenever you find yourself on the side of the majority, it is time to pause and reflect.” — Mark Twain ★ Come and play on OGS

Top
 Profile  
 
Online
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #14 Posted: Sun Feb 19, 2017 6:45 pm 
Lives in gote

Posts: 608
Location: Littleton, CO
Liked others: 203
Was liked: 178
Rank: KGS 4k
Universal go server handle: jeromie
Thanks for getting everything running again!

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #15 Posted: Sun Feb 19, 2017 10:45 pm 
Lives with ko

Posts: 131
Liked others: 8
Was liked: 29
Rank: KGS 2d
Thanks so much for all the efforts to get the site back online!

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #16 Posted: Mon Feb 20, 2017 3:17 am 
Oza
User avatar

Posts: 2341
Liked others: 1147
Was liked: 1041
I just want to add my appreciation that Jordus, Brian and Adrian had the will and technical wherewithal to turn a database dump back in to L19 gold. Thanks for your time and work!

_________________
These moves are not part of a regular dan repertoire... - Knotwilg

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #17 Posted: Mon Feb 20, 2017 6:04 am 
Lives with ko

Posts: 202
Location: Santiago, Chile
Liked others: 39
Was liked: 43
Rank: EGF 1d
Universal go server handle: Jhyn
Thank you all for your hard work and long life to the nineteen.
I blame the outage for my bad tournament results this weekend.

_________________
La victoire est un hasard, la défaite une nécessité.

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #18 Posted: Mon Feb 20, 2017 8:04 am 
Dies in gote

Posts: 25
Location: India
Liked others: 6
Was liked: 2
Rank: 7k KGS
KGS: herpderp
Universal go server handle: nukeu666
Online playing schedule: All day
Godaddy is probably the worst well-known host today, congrats on finally moving off them
Easier way to get https is go through cloudflare, only a few dns changes needed, unsure about cost vs traffic though

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #19 Posted: Mon Feb 20, 2017 8:14 am 
Gosei
User avatar

Posts: 1979
Location: Germany
Liked others: 7117
Was liked: 805
Rank: OGS + EGF DDK
OGS: trohde
One thing I notice is that pages seem to load a lot faster.
(Or is it the “Safari is snappier now after the OS update” illusion? :D )

I’m even tempted to raise the number of displayed comments on page … you’ll all remember the many problems we had loading with loading pages … would you recommend trying it or should I better not?

<edit>
Oh, and could we have two shrines, one for the folks who managed the site before the resurrection, another for the new admins? I’d want to burn some incense there :-)
</edit>

_________________
“Whenever you find yourself on the side of the majority, it is time to pause and reflect.” — Mark Twain ★ Come and play on OGS

Top
 Profile  
 
Offline
 Post subject: Re: We're Back! Extended Downtime Postmortem and Future Plan
Post #20 Posted: Mon Feb 20, 2017 7:57 pm 
Site Admin
User avatar

Posts: 1117
Location: Allegan, MI, USA
Liked others: 18
Was liked: 119
Rank: KGS 9k
Universal go server handle: Jordus
My apologies for the long downtime. :oops: Apetresc gave a good summary of the issue, the root problem was our original hosting service changed terms and put a 1gb size limit on the database. We outgrew that limit. I did not receive any warnings about the database and was not monitoring for this as I was unaware of the service change. The hosting service locked the database down which caused the database error everyone was seeing at the beginning of our downtime. I had made decisions throughout our downtime that I thought would help us get back up faster. Unfortunately it all backfired. I won't go into more details here but the short of it is after $500 and a lot of wasted time things went nowhere. I also was not able to be as attentive to the issue as I would have liked due to demands of the job(Network Engineer w/ on-call schedule falling in the middle of the issue.... :blackeye: ) and family (wife and 3 young children.... help me? ;-) ).

Again, I humbly apologize for the downtime. As a member of the tech community I take it as a personal failing on my part. Despite issues with our hosting I should have had contingencies in place. I've learned a lot since Lifein19x19 first came into existence. There are somethings I think I would have done differently and somethings I know could have been done better as I looked back at my old work during this migration process and thought "What was I doing when I did that?", "Oh wow I was definitely an amateur then.", "OMG what was I thinking?! :shock: :-? :oops: ". As Apetresc pointed out in his posting we now know where we came short and will work to make this not happen again.

Special thanks to Kirby for facilitating the communications with Apetresc to get us back up and running. Without Kirby we still may not have been back up at this time. :salute:

Also special thanks to Apetresc for using his servers and technical expertise to get us back up and running. :batman:

I'm glad that while the site was unavailable for some time, we managed to keep the data intact and prevent the loss of this treasure of data this forum holds like that which we felt from the loss of our predecessor site.

My Humblest Apologies,
Jordus

Long live Lifein19x19!!!!! :white: :bow: :black:

_________________
I'm thinking...


This post by Jordus was liked by 17 people: apetresc, Bonobo, daal, DrStraw, ez4u, GrB, gustav, jeromie, joellercoaster, jptavan, Kirby, Koosh, Monadology, schultz, Tanana, thombreSoft, wolfking
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group