Page 1 of 2

The KGS server keeps crashing

Posted: Sat May 22, 2010 9:20 pm
by Suji
What's wrong with the server tonight? It's apparently crashed twice.

Hopefully, it's nothing serious.

Re: The KGS server keeps crashing

Posted: Sun May 23, 2010 12:09 am
by wms
Not sure what it is. It is a crash that has been there since version 3.0.0; it's a bug that I haven't been able to track down. Usually it hits about once every 60 days, but in the past 24 hours it has hit 4 times instead.

It is possible that whatever causes the bug, somebody has decided to start doing that *A LOT*. But that would be strange, because I didn't think that the bug was caused by anything a user does. It is caused by memory corruption in my low level networking code. This code is extremely tricky, it is heavily multithreaded, written in C (the only part of the server that is), and uses the epoll Linux interface, which explains why there has been a bug that I've known about for 3+ years but haven't been able to fix.

If it keeps hitting...well, that will be useful information, but I'd rather get it another way of course.

Re: The KGS server keeps crashing

Posted: Sun May 23, 2010 2:06 am
by Ember
Well, I can't access KGS at all now (from Germany), the homepage seems down, too.
At first when I tried to login I immediately got the message that the server might be down, after waiting a bit that message took some time to appear and now there is no message at all but I still can't get onto the server, the client seems to just keep on trying and trying and... :cry:

But I guess you're already working on it, wms, so I hope that you'll fix that bug soon and everything will be allright. :)

EDIT: Please ignore the message below.. :D

EDIT 2: Well, I guess it was a bit too early to triumph.. ^^; It crashed again.

Re: The KGS server keeps crashing

Posted: Sun May 23, 2010 2:16 am
by wms
Yeah, tried a reboot to see if that helped. It looks like "no."

The server hasn't had an upgrade for months. So something new changed outside the server that made this bug show up a lot more often. No idea what.

Re: The KGS server keeps crashing

Posted: Sun May 23, 2010 4:03 am
by Phelan
There were some authentication problems with the desktop version a while back. Are you using that? either way, a screenshot or the details of the popup might help.

Re: The KGS server keeps crashing

Posted: Sun May 23, 2010 5:17 am
by CarlJung
Helel wrote:...swedish text in picture...


Ha ha, visste inte att du var svensk.

Re: The KGS server keeps crashing

Posted: Sun May 23, 2010 5:43 am
by tj86430
Since my line is quite slow, I'd really appreciate if the pictures weren't several megabytes...

Re: The KGS server keeps crashing

Posted: Sun May 23, 2010 6:46 am
by EdLee
Helel wrote:I only did it to annoy you. :twisted: It will not become a habit. ;-)
Helel, you can also try:
(1) Alt+"Prnt Scrn" to grab only the error window.
(2) Paste it in an editor like Irfanview -- http://www.irfanview.com/
(3) Reduce the screenshot size.
(4) Save as jpeg at 80% quality (see attached).
This usually reduces the file size by over 90%.
Or, you can TYPE all the error messages by hand. ;-)

Re: The KGS server keeps crashing

Posted: Sun May 23, 2010 6:57 am
by tj86430
Helel wrote:Svenska talas ju till och med av skåningar och annat slödder. :twisted:

Ja, även nästan tio procent av finnar talar svenska som modersmål (jag är dock inte en av dom).

Re: The KGS server keeps crashing

Posted: Mon May 24, 2010 12:19 pm
by Suji
wms wrote:Not sure what it is. It is a crash that has been there since version 3.0.0; it's a bug that I haven't been able to track down. Usually it hits about once every 60 days, but in the past 24 hours it has hit 4 times instead.

It is possible that whatever causes the bug, somebody has decided to start doing that *A LOT*. But that would be strange, because I didn't think that the bug was caused by anything a user does. It is caused by memory corruption in my low level networking code. This code is extremely tricky, it is heavily multithreaded, written in C (the only part of the server that is), and uses the epoll Linux interface, which explains why there has been a bug that I've known about for 3+ years but haven't been able to fix.

If it keeps hitting...well, that will be useful information, but I'd rather get it another way of course.


Hmmm...Interesting. Hopefully, you can find it and fix it.

On a lighter note, there's two quotes that I thought of.

1. "If debugging is the process of removing bugs from a program, then programming is the process in which bugs are introduced to the program."

2. "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

@WMS: In what way would you like to receive the information?

Re: The KGS server keeps crashing

Posted: Mon May 24, 2010 12:38 pm
by wms
When there's a bug, it's best if the bug shows up when I test so that I can fix it there.

But this bug happens so rarely, and I don't know how to make it happen, so I only get info about it from crashes...and very, very little info even then. All I've been able to pin down is that variables are getting utter nonsense in them. A boolean, for example, will have a number in it instead of a 0 or 1. I suspect that I'm walking past the end of an array in the network code, or something like that, but I can't figure out where that could be happening.

Re: The KGS server keeps crashing

Posted: Tue May 25, 2010 1:44 am
by CarlJung
Helel wrote:
wms wrote:When there's a bug, it's best if the bug shows up when I test so that I can fix it there.

But this bug happens so rarely, and I don't know how to make it happen, so I only get info about it from crashes...and very, very little info even then. All I've been able to pin down is that variables are getting utter nonsense in them. A boolean, for example, will have a number in it instead of a 0 or 1. I suspect that I'm walking past the end of an array in the network code, or something like that, but I can't figure out where that could be happening.


:twisted: :evil: :twisted: :evil: :twisted: :evil: :twisted: :evil: :twisted: :evil: :twisted:
Ever heard of open source code...
:twisted: :evil: :twisted: :evil: :twisted: :evil: :twisted: :evil: :twisted: :evil: :twisted:

Have fun debugging! :D


That wouldn't really change the nature of the bug, and there would still only be wms who has access to the server where the problem occurs. More eyes on the problem, yes, but that's it.

Re: The KGS server keeps crashing

Posted: Tue May 25, 2010 3:26 am
by CarlJung
Helel wrote:Ahh, so the bug is in no way related to anything wms has coded. My bad. :oops:


It's quite possible that it is, it remains to be seen. But even so, open sourceing the code wouldn't make it any easier to debug. It's on the server the error occurs, and you can't give everyone access to it to tinker away at their hearts content.

Re: The KGS server keeps crashing

Posted: Tue May 25, 2010 3:30 am
by CarlJung
wms,

I'm sure you have a test server. Have the error ever occurred on that one? Can't we fill it with a few thousand weakbots/randombots that all play blitz in order to simulate some load? I have 10MBit upload that mostly sits idle and a quite powerful computer. I'm sure others have similar setups. That could potentially be a way forward.

Re: The KGS server keeps crashing

Posted: Tue May 25, 2010 3:36 am
by tj86430
CarlJung wrote:
Helel wrote:Ahh, so the bug is in no way related to anything wms has coded. My bad. :oops:


It's quite possible that it is, it remains to be seen. But even so, open sourceing the code wouldn't make it any easier to debug. It's on the server the error occurs, and you can't give everyone access to it to tinker away at their hearts content.

What I remember from my merry days of coding and debugging C/C++ (which is what I suspect the code in question is), this kind of bug isn't often caught by debugging when the actual error occurs. The problematic code may have been executed well in advance. If the culprit is something wms wrote, then it might help to have several people look at it. Of course, if the bug may be virtually anywhere, it won't probably help unless it can be narrowed down. Of course one theoretical possibility is to run everything in debugger with watches guarding the memory that will eventually be overwritten, but that is (or at least used to be) much too slow. Perhaps debugging tools have improved since I coded for living.