It is currently Fri Aug 23, 2019 7:11 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 47 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
Offline
 Post subject: Re: Elf OpenGo paper released
Post #21 Posted: Wed Feb 13, 2019 2:33 pm 
Lives in sente

Posts: 1140
Location: Earth
Liked others: 370
Was liked: 185
43? 42!

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #22 Posted: Wed Feb 13, 2019 4:19 pm 
Judan

Posts: 6013
Location: Cambridge, UK
Liked others: 342
Was liked: 3236
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Elfv2 converted weights in LZ do at least not want to run out the working ladder in the pro game 4 (Elfv1 does) after a few thousand playouts, though there was a brief flash of blue there in Lizzie so if you are unlucky with your choice of low playouts (and 1600 is in the region) maybe it will. Some other interesting titbits:
- Elfv2 is back to like other bots in thinking white is better on the empty board Elfv1 was unusual in thinking black was better.
- In parallel 4-4, outside approach opening Elfv2 no longer thinks keima answer is a bad -7% like v1 did (with then 3-3 invade other white 4-4).

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #23 Posted: Thu Feb 14, 2019 11:58 am 
Lives in gote

Posts: 580
Location: Vienna, Austria
Liked others: 252
Was liked: 293
Uberdude wrote:
Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD! (Hope JF is ok with this)


I wondered about that too, but since the last game in that collection is "2018-08-08j.sgf" and the latest version you can buy at https://gogodonline.co.uk/ was released in April 2018, JF may have given both his consent and even the latest pre-release version to Facebook.

However, SmartGo Kifu has games even from 2019, so maybe they've gone that route, or even went as far as to extract the games one-by-one from there. Who knows.

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #24 Posted: Thu Feb 14, 2019 12:50 pm 
Oza

Posts: 2344
Liked others: 15
Was liked: 3400
Quote:
Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD! (Hope JF is ok with this)


It was all dealt with legitimately and above board, using the Spring 2018 edition, after I was approached by Facebook. I left that as the last stand-alone edition of the database so that it would remain in synch with Facebook's version, which does not include all of the metadata. It took rather longer than I expected (about 6 months) for the Facebook project to complete, but I will leave the Spring 2018 edition up for some time to come for those who want to acquire a matching and fully metadata-ed edition. SmartGo, as mentioned, has the true latest public (but not stand-alone) version, which is about 3% bigger already. My own version is bigger still, with some new Dosaku and Doetsu games just found!

BTW In one of my conversations with a programmer at Facebook, he said that no komi would certainly cause problems about the reliability of evaluations but he felt that for the early part of the game it was not likely to make much difference. I won't say which programmer it was in case he wants to change his mind about that.


This post by John Fairbairn was liked by 4 people: dfan, ez4u, Marcel Grünauer, Uberdude
Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #25 Posted: Thu Feb 14, 2019 1:49 pm 
Honinbo

Posts: 8706
Liked others: 2556
Was liked: 2990
John Fairbairn wrote:
BTW In one of my conversations with a programmer at Facebook, he said that no komi would certainly cause problems about the reliability of evaluations but he felt that for the early part of the game it was not likely to make much difference. I won't say which programmer it was in case he wants to change his mind about that.


Well, human players changed their minds about :w3: with komi versus without. :)

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #26 Posted: Thu Feb 14, 2019 4:57 pm 
Lives with ko

Posts: 166
Liked others: 24
Was liked: 49
Rank: 2d
Uberdude wrote:
- Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD

... and sadly, since SGF isn't sufficiently standardized, the annotations are only two numbers in the comments.

If you are on Linux, you can convert a file with the following command into something q5go can understand and produce a winrate graph for:
Code:
cat inputfile.sgf | sed   's,C\[\([.0-9]\+\)$,QLZV\[\1:,'  |tr -d '\n' |sed -e 's,US\[GoGoD,FG[257:]US[GoGoD,' -e 's,QLZV.\([0-9.]\+\):\([0-9.]\+\)],\nQLZV[\2:\1],g' >outputfile.sgf


This isn't perfect, ideally you'd want to mark the variations as figures, but I don't really see a way to do this from the command line. I have some local changes to automatically mark figures and diagrams but that isn't quite ready to be pushed yet...

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #27 Posted: Thu Feb 14, 2019 8:14 pm 
Oza
User avatar

Posts: 2142
Location: Tokyo, Japan
Liked others: 1977
Was liked: 1211
Rank: Jp 6 dan
KGS: ez4u
The analysis tool is interesting. However, the readme file contains the following statement.
"... Importantly, you can see humanity's improvement in the game in 2016, when Go AIs came onto the scene and taught humans to play at a higher level. Also notice the harm that the large historical event of WWII did to the game..." [emphasis added]
This is hilarious if you look carefully at the graph. The big dip does not coincide with WWII. It is the New Fuseki Era that caused the "harm". :)

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21


This post by ez4u was liked by 3 people: Bill Spight, Calvin Clark, sorin
Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #28 Posted: Fri Feb 15, 2019 7:37 am 
Tengen

Posts: 4255
Location: North Carolina
Liked others: 447
Was liked: 698
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 5k
This suggests an interesting, albeit vague, question: is there a way to assess whether humans learned something in 2016, filtering out the "easy moves"?

What I mean by "easy moves" is that if a move appears in a fuseki that was played by AlphaGo/LZ/ELF, and I copy it, I have "played better", but who cares? Once we're out of my opening book, I may or may not continue to make the moves the AI will approve of. I think it only makes sense to say I've learned if my moves are better in cases where I'm not just copying.[0]

Filtering the "opening book" would be an easy task, but it's probably not adequate. There are local patterns that also can be copied, and fusekis that differ only minimally from one that in an opening book. What we are really after is the use of those patterns in cases that require judgment about what the patterns accomplish.

By the way, when I say "just copying", I mean that to be a pretty low bar. I don't mean to say professionals must have an elaborate theory for why a new move works. Just that there has to be that level of judgment--even if the player is saying "where would the AI play?", that has to be a question, rather than coming straight from memory.

Anyway, I think the answer is probably yes. From commentaries, I get the feeling that professionals have changed more than just rote copying of the AI moves. However, I wonder if there's a way to measure it.

[0] Well, if I'm a professional--if I'm me, we know the answer is that I won't.

_________________
Occupy Babel!

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #29 Posted: Fri Feb 15, 2019 11:34 am 
Honinbo

Posts: 8556
Liked others: 1477
Was liked: 1438
KGS: Kirby
Tygem: 커비라고해
hyperpape wrote:
This suggests an interesting, albeit vague, question: is there a way to assess whether humans learned something in 2016, filtering out the "easy moves"?


The best I can think of is to measure how similar a given player is in their decision making to that of a particular version of a bot, e.g., by measuring the average and variation of changes in expected winning percentages for that player's moves. This is a heuristic, but not a definite answer, since a future version of a given bot may end up with a different idea of what's good and bad.

There are other problems, too. If a bot says that your move drops the winning percentage by 10%... What does it really mean about what you've learned? Sometimes part of learning is playing worse first, before you can play better. You can learn why the move you're experimenting with is bad, for example.

Probably still the most accurate way to track progress is to measure how often you win against a given level of opponent over time, though, that also has its problems. You may get better at winning against 5d player A, but not get better at winning against 5d player B...

Tough stuff... ¯\_(ツ)_/¯

_________________
it's be happy, not achieve happiness

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #30 Posted: Fri Feb 15, 2019 12:31 pm 
Lives in gote

Posts: 422
Liked others: 181
Was liked: 187
ez4u wrote:
This is hilarious if you look carefully at the graph. The big dip does not coincide with WWII. It is the New Fuseki Era that caused the "harm". :)


Brilliant!

But I'm also curious what happened in 1980, where there is a spike in "bad moves", "very bad moves" *, etc. even into the third set of 60 moves into the game. Komi change? Or did ELF just dislike Chinese players? Some of these may be artifacts of the kinds of games that were available to collect in GoGoD at the time. I'd be interested in John's view on that phenomenon.

* This definition is tricky. First, a human probability is not the same as an AI one. Second, attempts to do this crudely in chess unfairly punish more tactical players, because go strength is not just about making fewer mistakes but also about provoking your opponent to make bigger ones. The only thing that's really a mistake is going from a winning position to losing one, but that naturally happens when some strong players take the game out into a chaotic street melee as they are wont to do. Third, as Bill Spight has pointed out, the networks are trained to win, not to evaluate.

But it's fun to have the data, so thanks to the ELF OpenGo team for sharing!

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #31 Posted: Fri Feb 15, 2019 12:43 pm 
Judan

Posts: 6013
Location: Cambridge, UK
Liked others: 342
Was liked: 3236
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
The Elf win % drop and other metrics explorer is interesting, but there's a lot of caveats. For example, here is Elfv2's winrate from a recent tournament game of me (4d EGF) vs a 1d EGF (who used to be 4d BGA). He made me think with some tough fighting, but according to Elf I made only 1 significant mistake over 10% winrate drop (and Elf gives quite big swings), and once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.
Attachment:
Simons Wall elfv2.PNG
Simons Wall elfv2.PNG [ 173.48 KiB | Viewed 1171 times ]


For comparison, here's Iyama Yuta 9p vs Yamashita 9p's recent Kisei game winrate from Elf. Loads of mistakes from both all over the place (about 10 >10% each). I wouldn't claim this means I played better than them in my game: I had a more mismatched game against an opponent who didn't challenge me so much so they were facing more difficult positions in which to find the best move than I was, and consequently doing worse at it. Also I expect pro games will tend to be more evenly matched than a 4d vs 1d, but still the phenomenon of one player going to 99% fairly quickly (which for Elf might just be a 5 point lead) and thus no space left for winrate variations will happen.
Attachment:
Iyama Yamashita elfv2.PNG
Iyama Yamashita elfv2.PNG [ 206.5 KiB | Viewed 1171 times ]


And to avoid the "Japanese players are weak" criticism, here's Shin Jinseo vs Gu Zihao, not as mistakes as Iyama, but still quite a few.
Attachment:
Gu vs Shin elfv2.PNG
Gu vs Shin elfv2.PNG [ 161.66 KiB | Viewed 1160 times ]


This post by Uberdude was liked by: sorin
Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #32 Posted: Fri Feb 15, 2019 12:47 pm 
Judan

Posts: 6013
Location: Cambridge, UK
Liked others: 342
Was liked: 3236
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
bernds wrote:
Uberdude wrote:
- Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD

... and sadly, since SGF isn't sufficiently standardized, the annotations are only two numbers in the comments.

Trying to open these SGF in Lizzie makes it hang!

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #33 Posted: Fri Feb 15, 2019 1:07 pm 
Gosei

Posts: 1397
Liked others: 687
Was liked: 455
Rank: AGA 3k KGS 1k Fox 1d
GD Posts: 61
KGS: dfan
Calvin Clark wrote:
This definition is tricky. First, a human probability is not the same as an AI one. Second, attempts to do this crudely in chess unfairly punish more tactical players, because go strength is not just about making fewer mistakes but also about provoking your opponent to make bigger ones.
Indeed, when researchers have evaluated historical chess players by having computers rate their moves, Capablanca comes out better than expected (not that he was a slouch in the first place), because his simple style meant that he had fewer opportunities to make mistakes, compared to, say, someone like Kasparov who played in a maximal dynamic style.

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #34 Posted: Fri Feb 15, 2019 1:34 pm 
Honinbo

Posts: 8706
Liked others: 2556
Was liked: 2990
Uberdude wrote:
once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.


That's why the log of the odds ratio is a more informative measure. :)

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #35 Posted: Fri Feb 15, 2019 1:52 pm 
Judan

Posts: 6013
Location: Cambridge, UK
Liked others: 342
Was liked: 3236
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Bill Spight wrote:
Uberdude wrote:
once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.


That's why the log of the odds ratio is a more informative measure. :)


Even then I think the quality and precision of the bot's suggestions is reduced: if it says one move is 99.675 and another is 99.784 does can you really believe those sig figs? It just wants a safe win (0.5 points as seen from move 100 could be safe) and caa play slack moves whereas a human might want to keep pressing the advantage for a comfier margin. A better approach would be to be able to add in some dynamic komi to get the winrate back near 50% and then analyse from that modified board state. Unfortunately the Elf converted to LZ weights (at least v0 / v1) don't play nicely with the dynamic komi modified version of LZ engine.

Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #36 Posted: Fri Feb 15, 2019 3:59 pm 
Judan

Posts: 6013
Location: Cambridge, UK
Liked others: 342
Was liked: 3236
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
What is the story behind this graph? It shows the averaged biggest mistake (in win % drop) for all players over time. There probably wasn't much data back in 1700, and then we have them making bigger mistakes to around 1780, then getting better down to a trough of mistakes around 1860 to 1895. Is this seen as a golden age of Japanese go, you've got the last few years of Shusaku to start and Shuei at the end, though I presume he indivudually was a small part of the corpus. Checking the stats for just him he averaged around 24%, quite a bit lower than the trough at 27%. Then we have mistakes getting bigger again at turn of the century, is this the collapse of the Go houses? As ez4u mentioned the peak in mistakes in the 1930s is the Shin Fuseki era BEFORE WW2. And in modern times the reduction in biggest mistakes seems quite nicely correlated with reducing time limits ;-)

Attachment:
Biggest mistake al players over time.PNG
Biggest mistake al players over time.PNG [ 36.97 KiB | Viewed 1112 times ]


This post by Uberdude was liked by: sorin
Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #37 Posted: Sat Feb 16, 2019 2:59 am 
Oza

Posts: 2344
Liked others: 15
Was liked: 3400
I'm not sure the above graph tells us much. My own background in statistics is not much more than reading books like Freakonomics on long-haul flights, but I have hugely more knowledge of how the database was constructed and what it contains. Both of those things tell me to treat the graph with a great amount of caution.

Among the factors that potentially affect the results are these. This is not a complete list but does run roughly chronologically.

1. The very early games include a large number of Chinese games under ancient rules. Apart from the fact that these usually start with 4 stones on the board, which restricts the style of play somewhat, and are without komi, they also have group tax, which may be a distorting element. But the single biggest distortion is that the dates of these games are usually unknown (and even the dates of the players can be quite unknown). I therefore catalogued the games under the date, or estimated date, of the publications in which I found the games. This means, for example, that there are very many games labelled 1700. That's not when they were played.

2. Old games in general are likewise affected by being with no komi and often with handicaps. Thee handicaps include not just stones but the series type of handicaps (e.g. taking Black in 2 games out of 3). Since a series handicap was defined by current grades only initially, but could then change between those two same players (not to mention that grades were set largely on the basis of politics) and not others, there must have been many cases where the wrong handicap was used. In general, too, no komi is not just a problem with training bots but I expect it also encourages White to make wilder moves, and thus bigger potential mistakes.

3. At times such as the late Edo/Meiji period in Japan, there were fewer games because of less sponsorship and other external factors. But also there may be gaps in the record. For example, I have not got round to doing the complete games of Shuwa yet.

4. At any period with the older players, the corpus is likely to focus on the star players, via their collected games. This means many games from very early in their careers. Nowadays the proportion of games by weaker players is likely to be much less because there are just so many games by strong players to collect instead.

5. In the 1930s, as has been observed, there was a spike that can be considered to coincide with Shin Fuseki. Intuitively, I suppose we would expect many mistakes then as players started experimenting. But there may be a further factor. That period has been of special interest to me and so over the past 2-3 years I have been adding lots of games from this era. These are generally by weaker players (more mistake prone? More experimental?) and so in this period we get both more data and also data covering a much wider range of players (e.g. the Oteai B Section) than in other historical eras. There is also a trend towards the use of komi in this era, but weird ones such 2.5 points.

6. As regards the war period, there is actually very little data. Apart from disruption from bombs, and players being sent off to fight, paper was scarce and publication of games was minimal (many now known were reconstructed later from personal records of the players).

7. I don't think time limits have much to do with anything in this graph. For a start, as one example, in the days of 13 hours each, there were distortions at both ends of the scale. On the one hand many players would use the extra time to take a shower, pop out to the shops or have a snooze upstairs. At the other end of the scale, where a player did use all his time, it became apparent that this carried significant health risks, presumably making mistakes likelier, and so time limits were reduced significantly without any external pressure from sponsors or public.

8. Since I get to choose what goes in the database and I don't like games at Mickey Mouse time limits, I have at times tended to ignore these (at any era).

9. If we look at very recent times, several factors leap out at me for initial consideration. One is that the proportion of Chinese and Korean players represented has increased very significantly. Whether you accept Cho Hun-hyeon's view that this has coincided with a horde of programmatically trained clones is up to you, but I think what is beyond question is that the level of play has not just increased but differences in both strength and styles between players have become smaller, so that there are much fewer mismatches (with their potential for bigger mistakes?) than in the past.

10. I suspect there is also a flattening effect in modern times due to increases in komi. And of course Elf is trained on current komi, so is perhaps more likely, on average, to find big mistakes in games which are not at 7.5 komi? (Or even to report as mistakes moves in games at other komis which were not really wrong?)


This post by John Fairbairn was liked by 6 people: ez4u, smartgo, sorin, Uberdude, Waylon, wineandgolover
Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #38 Posted: Sat Feb 16, 2019 8:41 am 
Honinbo

Posts: 8706
Liked others: 2556
Was liked: 2990
Uberdude wrote:
Bill Spight wrote:
Uberdude wrote:
once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.


That's why the log of the odds ratio is a more informative measure. :)


Even then I think the quality and precision of the bot's suggestions is reduced: if it says one move is 99.675 and another is 99.784 does can you really believe those sig figs?


Well, I don't really believe that win rates are win rates, anyway. ;) I have an open mind about that, except for Leela 11. Playing around with Deep Leela, it seems to me that the win rate estimates for the player who is ahead are underestimates, as I expected. As for the stronger bots, I can't say.

But if we take the log odds ratios we get 5.73 and 6.14, respectively, for a difference of 0.41. By comparison the log odds ratio for 60% is 0.41 and the log odds ratio for 50% is, OC, 0. So the play with a win rate of 99.675% instead of 99.784% could be just as bad a mistake as a play with a win rate of 50% instead of 60%. Quien sabe?

But, as you say, surely the errors are larger as we approach 100% or 0%.

Edit: Also, the use of changes in win rate estimates between moves instead of comparison of win rate estimates for the same possible moves introduces the complication that the estimates should approach 0 or 1 as the game continues. That's probably a small factor early on, but in the endgame it could be significant.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis


This post by Bill Spight was liked by: sorin
Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #39 Posted: Sat Feb 16, 2019 9:20 am 
Gosei

Posts: 1397
Liked others: 687
Was liked: 455
Rank: AGA 3k KGS 1k Fox 1d
GD Posts: 61
KGS: dfan
Bill Spight wrote:
That's why the log of the odds ratio is a more informative measure.

By the way, the log of the odds ratio is what these networks actually produce under the hood. As people have noted here, you don't want to have to expend lots of energy making your network produce values bounded between 0 and 1 that precisely hit targets like 0.98 or 0.99. So instead you have the network produce an unbounded value, let's call it x, and then run x through the sigmoid function 1/(1 + e^(-x)) to produce a probability p between 0 and 1. Solving for x, you end up with x = log (p/(1-p)), which is the log odds.


This post by dfan was liked by: sorin
Top
 Profile  
 
Offline
 Post subject: Re: Elf OpenGo paper released
Post #40 Posted: Sat Feb 16, 2019 9:50 am 
Honinbo

Posts: 8706
Liked others: 2556
Was liked: 2990
dfan wrote:
Bill Spight wrote:
That's why the log of the odds ratio is a more informative measure.

By the way, the log of the odds ratio is what these networks actually produce under the hood.


Great minds think alike. :mrgreen:

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 47 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group