It is currently Thu Mar 28, 2024 2:16 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 418 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8 ... 21  Next
Author Message
Offline
 Post subject: Re: LZ's progression
Post #81 Posted: Sun Aug 05, 2018 2:19 am 
Lives in gote

Posts: 337
Liked others: 22
Was liked: 97
Someone at reddit/cbaduk (here) asked about 161 vs 157 at time parity...

I've run a 20-games match between #157 and #161 at 5 min. per game per side, no ponder, komi 7.5 (twogtp V1.4.10, 1xGTX1080)

LZ_015#157 v. LZ_015#161 :
LZ#157 wins 17-3 (10 wins as W, 7 wins as B)

Ouch... ! 20 games is not enough to be really sigificant, but stilll...

If someone wants the games, I'll upload them.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #82 Posted: Sun Aug 05, 2018 5:33 am 
Lives in gote

Posts: 502
Liked others: 1
Was liked: 153
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Quote:
Ouch... ! 20 games is not enough to be really sigificant, but stilll...


Actually, if a program A is stronger than B ( the win probability of A against B is >= 0.5), then there is under 0.13% chance that the program A get 3 win or less in 20 games. So the result is statistically significant.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #83 Posted: Sun Aug 05, 2018 7:48 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Tryss wrote:
Quote:
Ouch... ! 20 games is not enough to be really sigificant, but stilll...


Actually, if a program A is stronger than B ( the win probability of A against B is >= 0.5), then there is under 0.13% chance that the program A get 3 win or less in 20 games. So the result is statistically significant.


But is statistically significant really significant?

This may sound flip, but it is related to the replication crisis.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #84 Posted: Sun Aug 05, 2018 7:59 am 
Lives in gote

Posts: 337
Liked others: 22
Was liked: 97
Another 20-games match at 10min/game (1x1080): #157 wins 16-4 (9 times as W, 7 times as B)

And a 10-games match at something like 15min/game #157 wins 6-4 (3 times as W, 3 times as B)
(in fact it's a 5min/game match, but with 2x1080Ti, which corresponds to ~12-16 min/game on a 1080)

Again, if someone wants the games or the reports, I'll upload them.

In the 50 games played in these matches, #161 won only 11, that's 22%, and #161 has a better Elo score than #157, mmmmm... How come ?

Anyway, if someone wants the reports or the games, I'll upload them.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #85 Posted: Sun Aug 05, 2018 9:28 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Vargo wrote:
In the 50 games played in these matches, #161 won only 11, that's 22%, and #161 has a better Elo score than #157, mmmmm... How come ?


How come, indeed?

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #86 Posted: Sun Aug 05, 2018 9:34 am 
Gosei
User avatar

Posts: 1753
Liked others: 177
Was liked: 491
Test matches which are used to calculate the Elo score of LZ networks are not with time parity, but with the same number of visits. 20-block networks take more time per visit than 15-block networks.


This post by jlt was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #87 Posted: Sun Aug 05, 2018 9:51 am 
Lives in gote

Posts: 337
Liked others: 22
Was liked: 97
The Elo scale is at visits parity, that's why I'm doubtful about it... I would prefer something measuring "real" strength, and with the same scale as for human players, but maybe it's hard or impossible to do.

Anyway, I find all this very interesting and I hope LZ will bounce back !

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #88 Posted: Sun Aug 05, 2018 10:41 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Vargo wrote:
In the 50 games played in these matches, #161 won only 11, that's 22%, and #161 has a better Elo score than #157, mmmmm... How come ?

Accumulation of errors without calibration, either to older versions of self or an external benchmark. (On a related note, dead-reckoning / inertial guidance systems (as for Apollo or ICBMs) are amazing feats of engineering, I was recently reading about them.)

Uberdude wrote:
In fact during normal 15-block training I think doing some occasionally e.g. #157 vs #147 would be a good idea to see how inflated the incremental self-improvement Elo is (based on the incremental Elo differences from the promotions 147->148->149 etc it went from 11401 -> 11806 which predicts #157 would beat #147 91% of the time, but I would bet it would be quite a lot lower than that in reality).

I think I've managed to get twogtp working and am running a 147 vs 157 match now at 3200 visits.

Update: currently #157 leads only 7-3 as white. #157 6-0 as black before my PC crashed (maybe I shouldn't run this match and LZ and Elf analysis concurrently!).


This post by Uberdude was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #89 Posted: Fri Aug 10, 2018 9:40 am 
Lives in gote

Posts: 337
Liked others: 22
Was liked: 97
20-games match between LZ015#157 and LZ015#163 at time parity
#163 is 11985 Elo
#157 is 11806 Elo

5-min games, 1x1080, twogtp V1.4.10

#157 wins 16-4 (10 wins as W, 6 wins as B)

I understand the ranking matches are at visits parity, and all that, but there's still something weird with the Elo scale used. For example, the ranking of L-zero has #163 higher than #157 in "Dan scale" ...

Here, #163 wins only 20% of its games against #157... It's only 20 games, but I doubt that #163 is stronger than #157.


Attachments:
163_157_time5_157isB.zip [8.86 KiB]
Downloaded 439 times
163_157_time5_157isW.zip [8.71 KiB]
Downloaded 439 times

This post by Vargo was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #90 Posted: Sun Aug 12, 2018 8:49 am 
Dies with sente

Posts: 101
Liked others: 2
Was liked: 16
Rank: KGS 2 D
Vargo wrote:

Here, #163 wins only 20% of its games against #157... It's only 20 games, but I doubt that #163 is stronger than #157.


Could you try #165? It is noticeably more aggressive than before.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #91 Posted: Sun Aug 12, 2018 10:35 am 
Lives in gote

Posts: 337
Liked others: 22
Was liked: 97
20-games match between LZ_0.15#165 and LZ_0.15#157 at visits parity.

twogtp V1.4.10 --noponder --visits=1601 -komi 7.5 for both.
If you look at the .dat reports, you'll see that #165 takes at least twice as long as #157 to think.

#165 wins 14-6 (9 times as W, 5 times as B, all games by resignation)
Attachment:
reports.zip [1.14 KiB]
Downloaded 462 times

Attachment:
165_157_v1601_157isBlack.zip [8.82 KiB]
Downloaded 481 times

Attachment:
165_157_v1601_157isWhite.zip [8.69 KiB]
Downloaded 440 times

splee99 wrote:
Could you try #165? It is noticeably more aggressive than before.

Ok. Tomorrow, I'll run a 20-games match at time parity (#165 v #157). It will be interesting to compare results at visits parity and at time parity.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #92 Posted: Sun Aug 12, 2018 11:03 am 
Dies with sente

Posts: 101
Liked others: 2
Was liked: 16
Rank: KGS 2 D
Thanks for the report. I had run one game between #165 and #157 with time parity and #157 won. However the battle was so heavy that only bots can deal with it. Anyway I hope we can see some progress from #161 to #165 from your test tomorrow.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #93 Posted: Mon Aug 13, 2018 4:05 am 
Lives in gote

Posts: 337
Liked others: 22
Was liked: 97
40-game match at time parity between #165 and #157
(40 games, because I made a mistake, the first 20 games were all with #157 as White, so I had to run 20 more)

5 min per game, komi 7.5, no pondering, 1xGTX1080, twogtp V1.4.10
5 min per game on my computer is similar to 1600 visits for #165 or 3200 visits for #157

#157 wins 27-13
(15 times as W, 12 times as B, all games by resignation)

13 wins for #165, that's 32.5 %, much better than #161, which scored only 15% in a 20-game match at 5 min against #157.

The 40 games :
Attachment:
163_157_5min_157isWhite.zip [17.74 KiB]
Downloaded 412 times
Attachment:
165_157_5min_157isBlack.zip [18.34 KiB]
Downloaded 432 times


This post by Vargo was liked by: Uberdude
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #94 Posted: Mon Aug 13, 2018 9:12 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
There's a new 40block network out: http://zero.sjeng.org/networks/e2be4815 ... 1c836a3.gz. I wonder how would this would do vs #157 on 5 mins time parity? It got a very respectable 42% vs Elf v1 (presumably at visits parity) and 83% vs #162. So in a 5 mins game it'll probably get about 800 visits (Edit: is that true, it seems not approx. linear in blocks if 15 to 20 is a halving of visits!) to #157's 3200. Will its superior intuition and judgement win out, or will it make tactical blunders like ladders due to not enough playouts and lose?

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #95 Posted: Mon Aug 13, 2018 11:53 pm 
Lives in gote

Posts: 337
Liked others: 22
Was liked: 97
Uberdude wrote:
There's a new 40block network out: http://zero.sjeng.org/networks/e2be4815 ... 1c836a3.gz. I wonder how would this would do vs #157
20-game match between LZ0.15 with e2be48 (256x40) and LZ0.15 with #157 (192x15)

e2be48 at 1600 visits, and #157 at 6400 visits.

I think time parity is very roughly :
1x visits for 256x40 or
2x visits for 256x20 or
4x visits for 192x15

So, e2be48 at 1600 visits v. #157 at 6400 visits is at time parity.

#157 wins 11-9 (5 times as W, 6 times as B)
Attachment:
20 games.zip [18.62 KiB]
Downloaded 415 times

I just realized (better late than never !) that twogtp can give a nice and complete .html report, with min, max, standard deviation...

LIKE THIS

AND THIS

With 2xGTX1080Ti,
Average time used per game is around 3min10sec (see html for exact numbers)
Average length is around 220 moves


This post by Vargo was liked by 2 people: Uberdude, Waylon
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #96 Posted: Tue Aug 14, 2018 7:29 am 
Lives in gote

Posts: 502
Liked others: 1
Was liked: 153
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
So the latest 40b has about the same strenght as #157 at time parity ? Nice !

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #97 Posted: Tue Aug 14, 2018 8:05 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
Tryss wrote:
So the latest 40b has about the same strenght as #157 at time parity ? Nice !
I doubt this could be said in a general sense.

Vargo wrote:
So, e2be48 at 1600 visits v. #157 at 6400 visits is at time parity.

#157 wins 11-9 (5 times as W, 6 times as B)

The point of these deep and strong nets is to direct the search, shape the search tree. But with 1600 visits you don't have much of a search tree to be shaped (just a few moves lookahead). In those cases the bot won't be really strong anyway (even AGZ was only high dan amateur without search), and the most important is getting more visits (which #157 does).

But in serious games these bots do a lot more visits per move on good gpus. I suggest you repeat the test with at least 10k visits vs 40k. I would be surprised if 40b won't crush #157.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #98 Posted: Tue Aug 14, 2018 8:24 am 
Lives in gote

Posts: 337
Liked others: 22
Was liked: 97
moha wrote:
I doubt this could be said in a general sense.
You must be right, because I've tested the new 40b (e2be48) against #157 at time parity : only 5min on 1x1080.
For e2be48, it corresponds to only 500-600 visits(?)
#157 wins 14-6
6 wins as W here , and 8 wins as B

Maybe I'll try tomorrow a mini match at time parity, with 3200 or 6400 visits for e2be48.
e2be48 should be stronger, we'll see.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #99 Posted: Tue Aug 14, 2018 2:35 pm 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Well, I did a game (it took 2 hours) at large visits and didn't get the result I expected: 15 block #157 as white with 80k playouts beat 40 block 40b_157_360k e2be with 20k playouts which is very close to equal time (15b took 3114s, 40b 3201s). Looks like it was a half pointer but black did a load of inside territory nonsense instead of resigning, should I adjust some parameter?



2nd game 40 block won, same opening for first 27 moves, #157 diverged.


Attachments:
W147B40b157-1.sgf [2.5 KiB]
Downloaded 1638 times
W147B40b157-0.sgf [2.71 KiB]
Downloaded 1732 times
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #100 Posted: Tue Aug 14, 2018 3:52 pm 
Lives in gote

Posts: 502
Liked others: 1
Was liked: 153
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Here is a 40k vs 10k match I just did :


40d win as white against #157, (and it's close to time parity, with 2407s for B and 2559s for W)

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 418 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8 ... 21  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group