LZ's progression

For discussing go computing, software announcements, etc.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

Here are some more 20 game matches of #208 v. #208, with --visits=6401

Same parameters, except for --gpu 0 --gpu 1 (2x1080Ti)
It shouldn't change anything.
No error , no duplicate game.

So, same table as before, with an extra line (6401 → ...)
6401.gif
6401.gif (12.09 KiB) Viewed 16094 times
Seems like more visits really makes a difference, I find the score of 6401 v. 801 specially harsh :o !


The games between -v 6401 and -v 3201 (3201 is B in the even numbered games):
208_3201v6401.zip
(17.77 KiB) Downloaded 699 times
The stats for -v 6401 vs -v 3201 :
stats.gif
stats.gif (199.73 KiB) Viewed 16039 times
If someone wants the other stats or games, I can upload them.
moha
Lives in gote
Posts: 311
Joined: Wed May 31, 2017 6:49 am
Rank: 2d
GD Posts: 0
Been thanked: 45 times

Re: LZ's progression

Post by moha »

Vargo wrote:Seems like more visits really makes a difference, I find the score of 6401 v. 801 specially harsh :o !
IIRC similar tests were posted on github a year ago, and that time double playouts seemed to give roughly 75% winrate. This coincides with performance distributions about one standard deviation apart, which in turn can explain quadruple and octuple visits behaviour (3sd->98%, though doubling visits is not the same as doubling playouts, and at high visits the relations may change as well).
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\XXX.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -k XXX-elfv2
1). #205

Code: Select all

#205 v elfv2 ( 26 games)
           wins        black       white
#205    12 46.15%    2 50.00%   10 45.45%
elfv2   14 53.85%    2 50.00%   12 54.55%
                     4 15.38%   22 84.62%
2). #207

Code: Select all

#207 v elfv2 ( 26 games)
           wins        black       white
#207    13 50.00%    7 53.85%    6 46.15%
elfv2   13 50.00%    6 46.15%    7 53.85%
                    13 50.00%   13 50.00%
3). #208

Code: Select all

#208 v elfv2 ( 26 games)
           wins         black      white
#208     4 15.38%    1  9.09%    3 20.00%
elfv2   22 84.62%   10 90.91%   12 80.00%
                    11 42.31%   15 57.69%
4). #210
in progress...
Attachments
l0-elfv2.zip
(69.66 KiB) Downloaded 676 times
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

New network #212

Quick test about @jlt's law ;-)
(reminder : LZ#(n) is stronger than LZ#(n-10) at blocks and time parity)

added parameters -m 20, to avoid duplicate games, and -v 1601, to "standardize" the test.

50 games, no duplicate, no error.
Result : #212 wins 32-18 (64%)
__________________________________________________________________________

And now, how about a little controversy... :D :D

If #n wins 55% of its games against #n-1, and
If #n-1 wins 55% of its games against #n-2,and
...
and #n-9 wins 55% of its games against #n-10

#n should win 88% of its games against #n-10, but in this test, it wins only 64%...


In this case, it's as if the real average winrate of #n against #n-1 was only ~51.5% , and not 55%


Some caveats : -m 20 can alter results, and 50 games is not enough, but still, I remember @moha spoke about the primary source of Elo inflation being the amount of luck accumulated by the new networks in test matches. I think he was right.

Code: Select all

gogui-twogtp -black "C:\Users\jm\Desktop\gogui150\leela-zero-0.16-win64OK\leelaz.exe --gtp --weights=C:\Users\jm\Desktop\LZ_networks\212.gz --noponder --gpu 0 --gpu 1 -m 20 -v 1601" -white "C:\Users\jm\Desktop\gogui150\leela-zero-0.16-win64OK\leelaz.exe --gtp --weights=C:\Users\jm\Desktop\LZ_networks\202.gz --noponder --gpu 0 --gpu 1 -m 20 -v 1601" -games 50 -sgffile 212_202 -auto -komi 7.5 -alternate
The 50 games :
212_202.zip
(43.7 KiB) Downloaded 661 times
EDIT : #212 is B in the even numbered games, and W in the odd ones.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: LZ's progression

Post by Uberdude »

Vargo wrote:If #n wins 55% of its games against #n-1, and
If #n-1 wins 55% of its games against #n-2,and
...
and #n-9 wins 55% of its games against #n-10

#n should win 88% of its games against #n-10, but in this test, it wins only 64%...

In this case, it's as if the real average winrate of #n against #n-1 was only ~51.5% , and not 55%
Why should it? That's an assumption e.g. Elo rating systems take to make the problem simple enough to tackle, but there's no logical 'should' about it. If Man City beat Arsenal 3-0 and Arsenal beat Chelsea 2-0 we can't say Man City should beat Chelsea 5-0.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

55% winrate means A wins 55 games out of 100, A wins 55 when B wins 45. So, A wins 55/45 times more games than B . It's true, because (55/45)*45=55.
If B wins 55% over C, B wins 55/45 times more games than C.

A wins 55/45 times more games than B, who wins 55/45 times more games than C, so A wins (55/45)*(55/45) times more games than C.
etc.
A10 wins (55/45)^10 times more games than A1
A10 wins 7.44 times more games than A1
When A1 wins 1 game, A10 wins 7.44 games, so A10 wins 7.44 out of 8.44, that's 88%

For example, with an obvious case, if An wins 50% over An-1, after 10 networks, it leads to 1^10=1, and 1 out of (1+1) is still 50%.

With 50.1%, it leads to (50.1/49.9)^10=1.04, and 1.04/2.04 =~ 51 % which looks reasonable.

With 45% , after 10 networks, we would get ~12%

If we could have 10 consecutive networks with 60% winrate over the preceding one, we'd have 98.3% winrate for A10 over A1.


I hope it's understandable, I don't really speak english (as you've probably noticed :D )
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: LZ's progression

Post by ez4u »

Interesting discussion on Github of experimental version of LZ that uses alternative logic to select the best play with the same nets. See https://github.com/leela-zero/leela-zero/issues/2282 for the details and links to code or compiled windows downloads. The experimental version is showing ~60% winning percentage in matches with various visit levels versus normal LZ.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
Tryss
Lives in gote
Posts: 502
Joined: Tue May 24, 2011 1:07 pm
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Has thanked: 1 time
Been thanked: 153 times

Re: LZ's progression

Post by Tryss »

Vargo wrote:I hope it's understandable, I don't really speak english (as you've probably noticed :D )
It's understandable, but there's no reasons for this to be true.

It's not because you have a 1:a ratio against player A and that player A has a 1:b ratio against player B than you must have a 1:(a*b) ratio against B



It's already not true for this simple following game :

Player A roll two 8 sided dices, player B roll two 6 sided dices, and player C roll two 4 sided dices. The one with the biggest sum win, and if there's equality, the one with the smallest dices win.

In this simple game, A has 63.93% chance to win against B (1581/2304, or a 1:1.773 ratio), B has 69.10% chance to win against C (398/576 or a 1:2.236 ratio), and A has 82.42% chance to win against C (844/1024 or a 1:4.789 ratio).

But what you propose would give A 79.85% chance to win against C (1:3.964).
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

ez4u wrote:Player A roll two 8 sided dices, ...
As I said, a little controversy...
I'm not home now, but I'm looking forward to trying your dices ;-)
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: LZ's progression

Post by Bill Spight »

Vargo wrote:55% winrate means A wins 55 games out of 100, A wins 55 when B wins 45. So, A wins 55/45 times more games than B . It's true, because (55/45)*45=55.
If B wins 55% over C, B wins 55/45 times more games than C.

A wins 55/45 times more games than B, who wins 55/45 times more games than C, so A wins (55/45)*(55/45) times more games than C.
etc.

{snip}

I hope it's understandable, I don't really speak english (as you've probably noticed :D )
Yes, it is understandable and clear. However, there is an underlying assumption that the difference between the abilities of A, B, and C to win games is reducible to a single number. (There is also the assumption of perfect accuracy of the win rate estimates, i.e., no luck, which has already been alluded to.) But as we know go requires a number of different skills, which means that skill at go may not be reduced to a single number. And that means that transitivity does not hold. Player A may beat player B more than half the time, player B may beat player C more than half the time, and player C may beat player A more than half the time.

Now, transitivity holds closely enough in go that we can have different ranks, each of which covers a range of ratings, and make pretty good predictions of the handicap between players of different ranks which will make the win rates around 50%. But, OC, for specific individual pairings the recommended handicap may not do that. One thing that makes the ranking system robust is that each player plays against a variety of different players with different levels of ability at different skills. Self play does not do that, and so, IMO, does not produce robust results.

To give a possibly related example of how multidimensionality can reduce the degree of progress, let's suppose that we are measuring progress in two independent dimensions. Suppose that B makes one unit of progress by comparison with A, and C makes the same unit of progress by comparison with B, but in the orthogonal direction to that of the progress between A and B. Then how much progress does C make with regard to A? Not 2 units, but √2 units.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Tryss
Lives in gote
Posts: 502
Joined: Tue May 24, 2011 1:07 pm
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Has thanked: 1 time
Been thanked: 153 times

Re: LZ's progression

Post by Tryss »

Vargo wrote:
ez4u wrote:Player A roll two 8 sided dices, ...
As I said, a little controversy...
I'm not home now, but I'm looking forward to trying your dices ;-)
8 sided dices are common in tabletop gaming :

Image

4, 6, 8, 10, 12 and 20 sided dices are usual

Image

But there exist more exotic dices :mrgreen:
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

@Tryss
Your dice are beautiful. I had some such.
But in your example, A-B play a certain game, B-C play another game, because the dice are different, and A-C a third different game. In this case, I'm not surprised that winrates aren't transitive.
Bill Spight wrote:Now, transitivity holds closely enough in go that we can have different ranks,...
Yes, it's true, fortunately !
Bill Spight wrote:...Self play does not do that, and so, IMO, does not produce robust results.
It's true too, unfortunately.
Tryss
Lives in gote
Posts: 502
Joined: Tue May 24, 2011 1:07 pm
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Has thanked: 1 time
Been thanked: 153 times

Re: LZ's progression

Post by Tryss »

No, it's the same game : roll your dices, the one with the better score win :mrgreen:

Player A is just a stronger player than B or C
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: LZ's progression

Post by Bill Spight »

Tryss wrote:
Vargo wrote:I hope it's understandable, I don't really speak english (as you've probably noticed :D )
It's understandable, but there's no reasons for this to be true.

It's not because you have a 1:a ratio against player A and that player A has a 1:b ratio against player B than you must have a 1:(a*b) ratio against B



It's already not true for this simple following game :

Player A roll two 8 sided dices, player B roll two 6 sided dices, and player C roll two 4 sided dices. The one with the biggest sum win, and if there's equality, the one with the smallest dices win.

In this simple game, A has 63.93% chance to win against B (1581/2304, or a 1:1.773 ratio), B has 69.10% chance to win against C (398/576 or a 1:2.236 ratio), and A has 82.42% chance to win against C (844/1024 or a 1:4.789 ratio).

But what you propose would give A 79.85% chance to win against C (1:3.964).
I suppose that the faces of each die are numbered consecutively from 1 to the number of faces. Let's suppose that each player rolls only one die. Then A has an 9/16 chance (56.25%) to beat B, with odds of 9:7, and B has a 7/12 chance (58.33%) to beat C, with odds of 7:5. Multiplying the odds gives A odds of 9:5 to beat C, or 9/14 of the time (64.29%). But A beats C 11/16 of the time (81.25%), with odds of 11:5.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
moha
Lives in gote
Posts: 311
Joined: Wed May 31, 2017 6:49 am
Rank: 2d
GD Posts: 0
Been thanked: 45 times

Re: LZ's progression

Post by moha »

Out of curiosity I tested my method: 55% winrate means ~0.178 sd distribution distance, and 1.78 sd gives 89% - no surprise here. Transitivity is OC debatable but I doubt that would be the larger effect in this case.

Just retesting those 55% promotions with more games may reduce most to lower winrates. This is, afterall, how "55% for 400 games" were chosen: a statistical mass that makes it hard to pass on luck ALONE (in a few dozen tries), so new nets are at least slightly better - but nothing more. And those 400 samples are not even really independent: the first few moves, joseki choices are often identical, which further reduces the statistical weight.
Post Reply