LZ's progression

Vargo · Post by **Vargo** » Tue Mar 12, 2019 5:16 am

Here are some more 20 game matches of #208 v. #208, with --visits=6401

Same parameters, except for --gpu 0 --gpu 1 (2x1080Ti)
It shouldn't change anything.
No error , no duplicate game.

So, same table as before, with an extra line (6401 → ...)

: 6401.gif (12.09 KiB) Viewed 16927 times

Seems like more visits really makes a difference, I find the score of 6401 v. 801 specially harsh

!

The games between -v 6401 and -v 3201 (3201 is B in the even numbered games):

208_3201v6401.zip: (17.77 KiB) Downloaded 744 times

The stats for -v 6401 vs -v 3201 :

If someone wants the other stats or games, I can upload them.

moha · Post by **moha** » Tue Mar 12, 2019 3:56 pm

Vargo wrote:Seems like more visits really makes a difference, I find the score of 6401 v. 801 specially harsh !

IIRC similar tests were posted on github a year ago, and that time double playouts seemed to give roughly 75% winrate. This coincides with performance distributions about one standard deviation apart, which in turn can explain quadruple and octuple visits behaviour (3sd->98%, though doubling visits is not the same as doubling playouts, and at high visits the relations may change as well).

nbc44 · Post by **nbc44** » Wed Mar 13, 2019 2:14 pm

Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.

1). #205

Code: Select all

#205 v elfv2 ( 26 games)
           wins        black       white
#205    12 46.15%    2 50.00%   10 45.45%
elfv2   14 53.85%    2 50.00%   12 54.55%
                     4 15.38%   22 84.62%

2). #207

Code: Select all

#207 v elfv2 ( 26 games)
           wins        black       white
#207    13 50.00%    7 53.85%    6 46.15%
elfv2   13 50.00%    6 46.15%    7 53.85%
                    13 50.00%   13 50.00%

3). #208

Code: Select all

#208 v elfv2 ( 26 games)
           wins         black      white
#208     4 15.38%    1  9.09%    3 20.00%
elfv2   22 84.62%   10 90.91%   12 80.00%
                    11 42.31%   15 57.69%

4). #210
in progress...

Vargo · Post by **Vargo** » Sun Mar 17, 2019 3:02 am

New network #212

Quick test about @jlt's law

(reminder : LZ#(n) is stronger than LZ#(n-10) at blocks and time parity)

added parameters -m 20, to avoid duplicate games, and -v 1601, to "standardize" the test.

50 games, no duplicate, no error.
Result : #212 wins 32-18 (64%)
__________________________________________________________________________

And now, how about a little controversy...

If #n wins 55% of its games against #n-1, and
If #n-1 wins 55% of its games against #n-2,and
...
and #n-9 wins 55% of its games against #n-10

#n should win 88% of its games against #n-10, but in this test, it wins only 64%...

In this case, it's as if the real average winrate of #n against #n-1 was only ~51.5% , and not 55%

Some caveats : -m 20 can alter results, and 50 games is not enough, but still, I remember @moha spoke about the primary source of Elo inflation being the amount of luck accumulated by the new networks in test matches. I think he was right.

Code: Select all

gogui-twogtp -black "C:\Users\jm\Desktop\gogui150\leela-zero-0.16-win64OK\leelaz.exe --gtp --weights=C:\Users\jm\Desktop\LZ_networks\212.gz --noponder --gpu 0 --gpu 1 -m 20 -v 1601" -white "C:\Users\jm\Desktop\gogui150\leela-zero-0.16-win64OK\leelaz.exe --gtp --weights=C:\Users\jm\Desktop\LZ_networks\202.gz --noponder --gpu 0 --gpu 1 -m 20 -v 1601" -games 50 -sgffile 212_202 -auto -komi 7.5 -alternate

The 50 games :

212_202.zip: (43.7 KiB) Downloaded 703 times

EDIT : #212 is B in the even numbered games, and W in the odd ones.

Uberdude · Post by **Uberdude** » Sun Mar 17, 2019 6:24 am

Vargo wrote:If #n wins 55% of its games against #n-1, and
If #n-1 wins 55% of its games against #n-2,and
...
and #n-9 wins 55% of its games against #n-10

#n should win 88% of its games against #n-10, but in this test, it wins only 64%...

In this case, it's as if the real average winrate of #n against #n-1 was only ~51.5% , and not 55%

Why should it? That's an assumption e.g. Elo rating systems take to make the problem simple enough to tackle, but there's no logical 'should' about it. If Man City beat Arsenal 3-0 and Arsenal beat Chelsea 2-0 we can't say Man City should beat Chelsea 5-0.

Vargo · Post by **Vargo** » Sun Mar 17, 2019 7:05 am

55% winrate means A wins 55 games out of 100, A wins 55 when B wins 45. So, A wins 55/45 times more games than B . It's true, because (55/45)*45=55.
If B wins 55% over C, B wins 55/45 times more games than C.

A wins 55/45 times more games than B, who wins 55/45 times more games than C, so A wins (55/45)*(55/45) times more games than C.
etc.
A10 wins (55/45)^10 times more games than A1
A10 wins 7.44 times more games than A1
When A1 wins 1 game, A10 wins 7.44 games, so A10 wins 7.44 out of 8.44, that's 88%

For example, with an obvious case, if An wins 50% over An-1, after 10 networks, it leads to 1^10=1, and 1 out of (1+1) is still 50%.

With 50.1%, it leads to (50.1/49.9)^10=1.04, and 1.04/2.04 =~ 51 % which looks reasonable.

With 45% , after 10 networks, we would get ~12%

If we could have 10 consecutive networks with 60% winrate over the preceding one, we'd have 98.3% winrate for A10 over A1.

I hope it's understandable, I don't really speak english (as you've probably noticed

)

ez4u · Post by **ez4u** » Sun Mar 17, 2019 7:19 am

Interesting discussion on Github of experimental version of LZ that uses alternative logic to select the best play with the same nets. See https://github.com/leela-zero/leela-zero/issues/2282 for the details and links to code or compiled windows downloads. The experimental version is showing ~60% winning percentage in matches with various visit levels versus normal LZ.

Tryss · Post by **Tryss** » Sun Mar 17, 2019 9:18 am

Vargo wrote:I hope it's understandable, I don't really speak english (as you've probably noticed )

It's understandable, but there's no reasons for this to be true.

It's not because you have a 1:a ratio against player A and that player A has a 1:b ratio against player B than you must have a 1:(a*b) ratio against B

It's already not true for this simple following game :

Player A roll two 8 sided dices, player B roll two 6 sided dices, and player C roll two 4 sided dices. The one with the biggest sum win, and if there's equality, the one with the smallest dices win.

In this simple game, A has 63.93% chance to win against B (1581/2304, or a 1:1.773 ratio), B has 69.10% chance to win against C (398/576 or a 1:2.236 ratio), and A has 82.42% chance to win against C (844/1024 or a 1:4.789 ratio).

But what you propose would give A 79.85% chance to win against C (1:3.964).

Vargo · Post by **Vargo** » Sun Mar 17, 2019 9:45 am

ez4u wrote:Player A roll two 8 sided dices, ...

As I said, a little controversy...
I'm not home now, but I'm looking forward to trying your dices

Bill Spight · Post by **Bill Spight** » Sun Mar 17, 2019 9:52 am

Vargo wrote:55% winrate means A wins 55 games out of 100, A wins 55 when B wins 45. So, A wins 55/45 times more games than B . It's true, because (55/45)*45=55.
If B wins 55% over C, B wins 55/45 times more games than C.

A wins 55/45 times more games than B, who wins 55/45 times more games than C, so A wins (55/45)*(55/45) times more games than C.
etc.

{snip}

I hope it's understandable, I don't really speak english (as you've probably noticed )

Yes, it is understandable and clear. However, there is an underlying assumption that the difference between the abilities of A, B, and C to win games is reducible to a single number. (There is also the assumption of perfect accuracy of the win rate estimates, i.e., no luck, which has already been alluded to.) But as we know go requires a number of different skills, which means that skill at go may not be reduced to a single number. And that means that transitivity does not hold. Player A may beat player B more than half the time, player B may beat player C more than half the time, and player C may beat player A more than half the time.

Now, transitivity holds closely enough in go that we can have different ranks, each of which covers a range of ratings, and make pretty good predictions of the handicap between players of different ranks which will make the win rates around 50%. But, OC, for specific individual pairings the recommended handicap may not do that. One thing that makes the ranking system robust is that each player plays against a variety of different players with different levels of ability at different skills. Self play does not do that, and so, IMO, does not produce robust results.

To give a possibly related example of how multidimensionality can reduce the degree of progress, let's suppose that we are measuring progress in two independent dimensions. Suppose that B makes one unit of progress by comparison with A, and C makes the same unit of progress by comparison with B, but in the orthogonal direction to that of the progress between A and B. Then how much progress does C make with regard to A? Not 2 units, but √2 units.

Tryss · Post by **Tryss** » Sun Mar 17, 2019 10:01 am

Vargo wrote:
ez4u wrote:Player A roll two 8 sided dices, ...
As I said, a little controversy...
I'm not home now, but I'm looking forward to trying your dices

8 sided dices are common in tabletop gaming :

4, 6, 8, 10, 12 and 20 sided dices are usual

But there exist more exotic dices

Vargo · Post by **Vargo** » Sun Mar 17, 2019 11:08 am

@Tryss
Your dice are beautiful. I had some such.
But in your example, A-B play a certain game, B-C play another game, because the dice are different, and A-C a third different game. In this case, I'm not surprised that winrates aren't transitive.

Bill Spight wrote:Now, transitivity holds closely enough in go that we can have different ranks,...

Yes, it's true, fortunately !

Bill Spight wrote:...Self play does not do that, and so, IMO, does not produce robust results.

It's true too, unfortunately.

Tryss · Post by **Tryss** » Sun Mar 17, 2019 11:16 am

No, it's the same game : roll your dices, the one with the better score win

Player A is just a stronger player than B or C

Bill Spight · Post by **Bill Spight** » Sun Mar 17, 2019 11:26 am

Tryss wrote:
Vargo wrote:I hope it's understandable, I don't really speak english (as you've probably noticed )
It's understandable, but there's no reasons for this to be true.

It's not because you have a 1:a ratio against player A and that player A has a 1:b ratio against player B than you must have a 1:(a*b) ratio against B

It's already not true for this simple following game :

Player A roll two 8 sided dices, player B roll two 6 sided dices, and player C roll two 4 sided dices. The one with the biggest sum win, and if there's equality, the one with the smallest dices win.

In this simple game, A has 63.93% chance to win against B (1581/2304, or a 1:1.773 ratio), B has 69.10% chance to win against C (398/576 or a 1:2.236 ratio), and A has 82.42% chance to win against C (844/1024 or a 1:4.789 ratio).

But what you propose would give A 79.85% chance to win against C (1:3.964).

I suppose that the faces of each die are numbered consecutively from 1 to the number of faces. Let's suppose that each player rolls only one die. Then A has an 9/16 chance (56.25%) to beat B, with odds of 9:7, and B has a 7/12 chance (58.33%) to beat C, with odds of 7:5. Multiplying the odds gives A odds of 9:5 to beat C, or 9/14 of the time (64.29%). But A beats C 11/16 of the time (81.25%), with odds of 11:5.

moha · Post by **moha** » Sun Mar 17, 2019 11:23 pm

Out of curiosity I tested my method: 55% winrate means ~0.178 sd distribution distance, and 1.78 sd gives 89% - no surprise here. Transitivity is OC debatable but I doubt that would be the larger effect in this case.

Just retesting those 55% promotions with more games may reduce most to lower winrates. This is, afterall, how "55% for 400 games" were chosen: a statistical mass that makes it hard to pass on luck ALONE (in a few dozen tries), so new nets are at least slightly better - but nothing more. And those 400 samples are not even really independent: the first few moves, joseki choices are often identical, which further reduces the statistical weight.

Life In 19x19

LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression