"Indefinite improvement" for AlphaZero-like engines

For discussing go computing, software announcements, etc.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: "Indefinite improvement" for AlphaZero-like engines

Post by Bill Spight »

Let me clear up some confusion I may have caused about chilled go and rounding. I was a bit confused, myself. :oops:
Click Here To Show Diagram Code
[go]$$ Chilled go -1¼ pts.
$$ ------------------
$$ . . X . . . O . .
$$ . . X O O O O . .
$$ . . X X X X O . .[/go]
Suppose that the rest of the board is settled and all of these stones are unconditionally alive.

-1¼ is not only the local chilled go score, it is the estimate of both the territory and area local scores.

OC, play stops in chilled go, but may continue under territory or area scoring. Black to play can round the local score up to -1, White to play can round the local score to -2 by territory scoring, -3 by area scoring.
Click Here To Show Diagram Code
[go]$$ Black rounds up
$$ ------------------
$$ . . X 1 2 . O . .
$$ . . X O O O O . .
$$ . . X X X X O . .[/go]
Click Here To Show Diagram Code
[go]$$W White rounds down
$$ ------------------
$$ . . X 1 . . O . .
$$ . . X O O O O . .
$$ . . X X X X O . .[/go]
Now let's look at a fractionally worse position.
Click Here To Show Diagram Code
[go]$$ Chilled go -1½ pts.
$$ ------------------
$$ . . X . . O . O .
$$ . . X O O O O O .
$$ . X X X X X X O .[/go]
Again, -1½ is the local chilled go score and the estimate for both the territory and area local scores.
Click Here To Show Diagram Code
[go]$$B Black rounds up
$$ ------------------
$$ . . X B . O . O .
$$ . . X O O O O O .
$$ . X X X X X X O .[/go]
:b1: rounds up to a local territory score of -1, which is also the estimate of the local area score, which depends on who gets the dame.
Click Here To Show Diagram Code
[go]$$W White rounds down
$$ ------------------
$$ . . X 1 . O . O .
$$ . . X O O O O O .
$$ . X X X X X X O .[/go]
:w1: rounds down to a local territory score of -2, and local area score of -3.

It is asserted that, without ko complications, correct play under chilled go is also correct under territory scoring (except with special rules, such as not counting territory in seki) and under area scoring.

But let's suppose that Black at some point, assuming correct play thereafter, has the option of playing to the first diagram or to the second diagram, with the territory score being the same elsewhere on the board. However, the first option will round down to -2, while the second option will round up to -1, which means that the second option is better by territory and area scoring, despite having a lower score by chilled go. If play by chilled go is correct by territory and area scoring, how can that be?

What I overlooked in the previous discussion was the difference between the local chilled go score and the global score. If the -1¼ pt. position rounds down, that means that it is White's turn to play, and, since Black plays first in go, Black has made an extra play. Under chilled go each play costs 1 point and so globally we subtract 1 pt. from the territory estimate to get the chilled go score. So, instead of being ¼ pt. better for Black than Diagram 2, Diagram 1 is ¾ pt. worse, given the penalty for the extra Black play elsewhere on the board.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
iopq
Dies with sente
Posts: 113
Joined: Wed Feb 27, 2019 11:19 am
Rank: 1d
GD Posts: 0
Universal go server handle: iopq
Has thanked: 11 times
Been thanked: 27 times

Re: "Indefinite improvement" for AlphaZero-like engines

Post by iopq »

mhlepore wrote:To me, Silver's claim seems reasonable (at least to the point of the game being solved).

To use a sports analogy, occasionally weaker players can beat stronger players. Why?
--Human athletes make blunders.
--Balls rattle off the rim in unlucky ways.
--There is a lot of natural noise in certain games (e.g., baseball), such that the best teams lose 1/3 of the time.

The strong Go programs, to me anyway, are not likely to be impacted by this variability. They are not likely to misread a life/death problem, or a ladder, or miscount. And once you knock out a lot of the mistakes, it becomes in essence about backward induction, and the more powerful system should pretty much always win.

That said, I wonder if Silver would still back the statement, however replacing 100-0 with 1,000,000-0.
First of all, they are really bad at large semeais. This is because there's no code that simplifies the search in the case of different orders, I think they don't even have transpositions. So if there are 20 liberties for each group, but no escape or life, there's a chance a computer tenukis and turns a seki into certain death.

In the case of ladders, KataGo has code for them, but it may not play out a ladder that doesn't work - even if this wins the game, like in the ladder game. This is because the code only says "this ladder fails" which may bias against searching it.

In terms of endgame, it is helpless in the examples that can be solved by endgame theory. There are 20 different moves, all seem to be around 1 point, but only a few correct orderings exist.

But actually, the reason why the next bot beats the previous bot is actually very different. It will play Mi Yuting's flying dagger slightly better, getting a small advantage that it turns into a win. The next bot will find another variation that is even better, etc.

I've already had this happen on 9x9, each generation beats the previous one by 65%, two generations ago by 75%, etc.

Yet against KataGo all of them are around 40% - my own bot can beat the previous generation more convincingly than KG, but it can't beat KG.

So while 100-0 multiple times will happen, it will be against itself. Against an opponent that doesn't play the same joseki the improvement will be slow. This is why self-play data for ELO is extremely inflated.
TOTAL
Beginner
Posts: 5
Joined: Fri Nov 26, 2021 8:56 am
GD Posts: 0

Re: "Indefinite improvement" for AlphaZero-like engines

Post by TOTAL »

How about this perspective: the problem will remain computational until the game has been mapped.

1. think of a 4x4 board - or some other reasonable minimum (keeping this vague on purpose) - is there a complete tree? Might be more complex than tic tac toe, which I have read has been mapped because stones may be removed but this just needs more computational power.
2. if such a minimum reasonable board game is mappable, is there any board size threshold beyond which things are no longer so? If not, the game is finite.
3. before mapping is done, refining algorithms makes perfect sense, and I agree that handicap is not what David Silver probably referred to. he spoke of percentage of wins instead. Can't see why a sudden inflation in the depth of predictions, probably enabled by computing power mainly might not, theoretically, enable 100-0 wins per each generation. Engaging algorithms in developing new blueprints for 'cracking' go seems science fiction, but if a blueprint for THIS should be 'cracked' at some point, sky is the limit - and that's the last thing we need to do.
4. as and when mapping is completed, the rest would be 'sampling' - not 'synthesis', but that gets off-topic.

I think DS may have meant that progress will be immense and there is plenty of depth left to explore. But for such a spectacular explosion of progress (100-0's), I think humans need to be out of the design table. Just sit back and try to make sense :)
iopq
Dies with sente
Posts: 113
Joined: Wed Feb 27, 2019 11:19 am
Rank: 1d
GD Posts: 0
Universal go server handle: iopq
Has thanked: 11 times
Been thanked: 27 times

Re: "Indefinite improvement" for AlphaZero-like engines

Post by iopq »

TOTAL wrote:How about this perspective: the problem will remain computational until the game has been mapped.

1. think of a 4x4 board - or some other reasonable minimum (keeping this vague on purpose) - is there a complete tree? Might be more complex than tic tac toe, which I have read has been mapped because stones may be removed but this just needs more computational power.
2. if such a minimum reasonable board game is mappable, is there any board size threshold beyond which things are no longer so? If not, the game is finite.
3. before mapping is done, refining algorithms makes perfect sense, and I agree that handicap is not what David Silver probably referred to. he spoke of percentage of wins instead. Can't see why a sudden inflation in the depth of predictions, probably enabled by computing power mainly might not, theoretically, enable 100-0 wins per each generation. Engaging algorithms in developing new blueprints for 'cracking' go seems science fiction, but if a blueprint for THIS should be 'cracked' at some point, sky is the limit - and that's the last thing we need to do.
4. as and when mapping is completed, the rest would be 'sampling' - not 'synthesis', but that gets off-topic.

I think DS may have meant that progress will be immense and there is plenty of depth left to explore. But for such a spectacular explosion of progress (100-0's), I think humans need to be out of the design table. Just sit back and try to make sense :)

There might be an opening that forces all of the game to go in a certain way - so an engine that knows how to play perfectly from an empty board despite not knowing how to solve difficult situations that happen on arbitrary boards

So after mapping out the first X moves the engine might be strong enough not to lose to perfect play given Y visits per move
Mike Novack
Lives in sente
Posts: 1045
Joined: Mon Aug 09, 2010 9:36 am
GD Posts: 0
Been thanked: 182 times

Re: "Indefinite improvement" for AlphaZero-like engines

Post by Mike Novack »

TOTAL wrote:How about this perspective: the problem will remain computational until the game has been mapped.

2. if such a minimum reasonable board game is mappable, is there any board size threshold beyond which things are no longer so? If not, the game is finite.
Except finite vs infinite not really what is at stake. The question is how does the task of mapping go up with the increase in board size. If "m" is the amount of computation required for a mapping and "b" the board size we have.....

m = f(b) where "f" is some function and the question is what sort of function. In other words, how fast foes "M" grow as "b" grows.

Now suppose "f" is of the sort bb. In other words, m=abb Well "m" is still finite for any finite values of "a" and "b" but it is increasing VERY fast as "b" increases.
Post Reply