Re: Computer are strong?!

topazg · Post by **topazg** » Tue Dec 21, 2010 7:50 am

Mike Novack wrote:I think you are considering too small a portion of the curve.

Consider the shape of the curve performance vs time (number of algorithm steps) over a large range. What I am saying is that below some number of steps (too little time) won't be other than random moves. In this region the curve is very steep, great improvement when more time is allowed. And at the other end gogin to take a lot more than doubling to increase one level. So yes, somewhere in between you would observe what you say you do (doubling time per level improvement). But I think:

a) That's over a relatively small number of playing levels. Keep in mind that even an exponent of 2 grows quickly.
b) The strongest programs are currently above this point pon the curve. In other words, the implementations are fast enough that they are able to play at acceptable speed (from the human point of view) at a level where for them to go up another level would take much more than doubling the time.

I agree with this, although I think both sides have an important point. It is I suspect hard to know what the ratio is for different strengths, even picking an individual bot. Also, I suspect now that the "nearly random moves" area of thinking time allowance is in small fractions of a single second. I suspect a bot taking 3 seconds per move will play fairly strongly compared to, say, 30 seconds per move.

Mike Novack wrote:You want a practical example? How about MFOG 12.21? It is supposed to be at 1 dan on a "standard" 2 core machine that a program buyer might be expected to have but the bot on KGS is playing at 2 dan on a machine about 6 times more powerful than "standard" (equivalent to six times the time).

Supposed to be what sort of 1 dan? A "standard 2 core machine" = 1 dan on KGS or as defined somewhere else?

Also, doubling processing power is a better measure than doubling time if being compared to humans (which presumably the 2 dan has been earned against)

liquido · Post by **liquido** » Tue Dec 21, 2010 8:22 am

Mike Novack wrote:I think you are considering too small a portion of the curve.

Consider the shape of the curve performance vs time (number of algorithm steps) over a large range. What I am saying is that below some number of steps (too little time) won't be other than random moves. In this region the curve is very steep, great improvement when more time is allowed. And at the other end gogin to take a lot more than doubling to increase one level. So yes, somewhere in between you would observe what you say you do (doubling time per level improvement).

You are correct in the lower extreme of time, but anything longer than 10s is well into this curve and less than 10s is not really reasonable thinking time for a human (topazg has basically already pointed this out). Bear in mind that most MCTS programs can do in the order of 10000 playouts a second.

The trend I describe actually continues for a quite some time. Have a look at this study: http://cgos.boardspace.net/study/index.html This extended up to 8388608 playouts for Mogo and many more for Fatman. I have not seen or done any tests myself to support this trend on 19x19, but I see no reason this trend should not hold.

Mike Novack wrote:You want a practical example? How about MFOG 12.21? It is supposed to be at 1 dan on a "standard" 2 core machine that a program buyer might be expected to have but the bot on KGS is playing at 2 dan on a machine about 6 times more powerful than "standard" (equivalent to six times the time).

Is this a 1 dan on KGS? You also have to bear in mind that scaling over multiple cores or a cluster has other performance penalties and is currently one of the areas of research in Computer Go.

topazg · Post by **topazg** » Tue Dec 21, 2010 8:50 am

liquido wrote:The trend I describe actually continues for a quite some time. Have a look at this study: http://cgos.boardspace.net/study/index.html This extended up to 8388608 playouts for Mogo and many more for Fatman. I have not seen or done any tests myself to support this trend on 19x19, but I see no reason this trend should not hold.

This is very interesting. As the playouts is presumably the primary factor, a bigger board should have the curve arrive later due to the additional time to complete the payouts presumably? If so, how much longer does a 19x19 playout take than a 9x9 playout, having averaged out including overhead time?

From the table you sent, and assuming 10,000 playouts per sec, Mogo gets stronger as follows:

6.5 secs/move = 2469
13.1 = 2580
26.2 = 2659
52.4 = 2757

1.75 mins/move = 2815
3.5 = 2893
7.0 = 2959

So, interestingly, there's 2 full stones of difference between 6.5 and 26.2 seconds, which I wouldn't have guessed. Just as interestingly, it's another 2 stones stronger at 7 mins per move - I would have thought it would have flattened off before into the minutes per move, but it's only really 7-14-28 minutes where the flattening off is clear. At lower levels, this increase is obviously more marked than 1 stone per doubling in playouts, so the comparative playout time for 19x19 boards is quite relevant I think.

liquido · Post by **liquido** » Tue Dec 21, 2010 8:58 am

topazg wrote:This is very interesting. As the playouts is presumably the primary factor, a bigger board should have the curve arrive later due to the additional time to complete the payouts presumably? If so, how much longer does a 19x19 playout take than a 9x9 playout, having averaged out including overhead time?

A very quick test with my program shows that playouts on a 9x9 are about 4.68 time faster than a 19x19, and considering 361/81=4.46, the speed of playouts seems to be directly proportional to intersections on the board.

topazg · Post by **topazg** » Tue Dec 21, 2010 9:29 am

liquido wrote:
topazg wrote:This is very interesting. As the playouts is presumably the primary factor, a bigger board should have the curve arrive later due to the additional time to complete the payouts presumably? If so, how much longer does a 19x19 playout take than a 9x9 playout, having averaged out including overhead time?
A very quick test with my program shows that playouts on a 9x9 are about 4.68 time faster than a 19x19, and considering 361/81=4.46, the speed of playouts seems to be directly proportional to intersections on the board.

Awesome, thanks, which makes it, (the equivalent of, obviously actual performance will be lower on 19x19):

3.25 secs/move = 2063
6.5 = 2270
13.1 = 2339
26.2 = 2469
52.4 = 2580

1.75 mins/move = 2659
3.5 = 2757
7.0 = 2815

So it looks like about 4 stones between my original figures of 3 and 30 seconds per move. Even though this is seriously back of the envelope, I think I stand corrected. So, as liquido said, one stone per doubling in a "normal time controls" range (6.5 secs per move up to 3.5 minutes per move seems to fit this) is reasonably accurate, according to this table at least.

It would be interesting to see if this isn't accurate. Presumably, because of the fact that the number of possible variations on a 19x19 board is so much larger, 10,000 playouts on a 9x9 board will provide considerably better overall accuracy than 10,000 playouts on a 19x19 board. It would be very interesting to see this scaled to 19x19.

Mike Novack · Post by **Mike Novack** » Wed Dec 22, 2010 7:55 am

topazg wrote: It would be interesting to see if this isn't accurate. Presumably, because of the fact that the number of possible variations on a 19x19 board is so much larger, 10,000 playouts on a 9x9 board will provide considerably better overall accuracy than 10,000 playouts on a 19x19 board. It would be very interesting to see this scaled to 19x19.

Possibly not true? We are dealing with statistics. Results are poor if the sample size is inadequate but above some sample size improve only slightly. It matters less that 10,000 is a larger percentage of all possible moves at 9x9 than at 19x19.

In other words, if we sample 1000 out of 1 million we do not have to sample 10,000 out of 10 million to have equally valid results (the equal validity sample size might be 1100 -- I don't have the tables to look this up but expect it to be a small change of this sort).

pasky · Post by **pasky** » Wed Dec 22, 2010 6:26 pm

Mike Novack wrote:2) You can't go by how the algorithms of several years ago behaved so the rest of what you say is outdated. The dominant algorithm now used by all the strongest programs does not behave the way you have described. Currently performance is limited purely by time and isn't "biased" in the way you think. Given enough time these algorithms would discover the best next move. For these programs "tuning" is adjusting behavior so as to get the best performance within the constraint of actual time given the allowed computer power. How that is done might or might not introduce "bias" (it doesn't have to -- need not be deterministic*)

If you use plain RAVE, I think it's proven that MCTS does not neccessarily converge to the best move anymore. Even if it did in theory, in practice RAVE puts you in very deep valleys within the tree all the time that make it exceedingly difficult to overcome invalid biases produced by the simulations. The point of RAVE is that the valleys are mostly trails in good directions. But it's like in human play - you end up reading long, mostly straight lines with your pattern matcher feeding you with the sequence, but if it never feeds you the counter-tesuji, your reading becomes completely wrong.

It's true that it's still better than gnugo group solver never spotting the right move since it's missing from its pattern database. But it's not much better.

Mike Novack wrote:3) Objectives differ. Are we after the strongest possible program (given the time/machine power constraint) or the strongest one that can pass or come close to passing the Turing test within that constraint? (not obviously identifiable as a non human player -- if presented with a set of games some of which between two humans and some between a human and this program you could not easily/certainly separate into the two subsets)

I think there's little interest in Turing-passing programs. Commercial programs authors do care somewhat since it matters to them how "pretty" the program plays, but overally the research does not seem to concern with this at all.

bustaballs · Post by **bustaballs** » Sun Jan 02, 2011 4:51 pm

I'm about 10k on KGS and I can't even begin to compete against the weakest computer on GNU Go.