Life In 19x19
http://lifein19x19.com/

Elf OpenGo paper released
http://lifein19x19.com/viewtopic.php?f=18&t=16441
Page 2 of 3

Author:  Uberdude [ Wed Feb 13, 2019 4:19 pm ]
Post subject:  Re: Elf OpenGo paper released

Elfv2 converted weights in LZ do at least not want to run out the working ladder in the pro game 4 (Elfv1 does) after a few thousand playouts, though there was a brief flash of blue there in Lizzie so if you are unlucky with your choice of low playouts (and 1600 is in the region) maybe it will. Some other interesting titbits:
- Elfv2 is back to like other bots in thinking white is better on the empty board Elfv1 was unusual in thinking black was better.
- In parallel 4-4, outside approach opening Elfv2 no longer thinks keima answer is a bad -7% like v1 did (with then 3-3 invade other white 4-4).

Author:  John Fairbairn [ Thu Feb 14, 2019 12:50 pm ]
Post subject:  Re: Elf OpenGo paper released

Quote:
Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD! (Hope JF is ok with this)


It was all dealt with legitimately and above board, using the Spring 2018 edition, after I was approached by Facebook. I left that as the last stand-alone edition of the database so that it would remain in synch with Facebook's version, which does not include all of the metadata. It took rather longer than I expected (about 6 months) for the Facebook project to complete, but I will leave the Spring 2018 edition up for some time to come for those who want to acquire a matching and fully metadata-ed edition. SmartGo, as mentioned, has the true latest public (but not stand-alone) version, which is about 3% bigger already. My own version is bigger still, with some new Dosaku and Doetsu games just found!

BTW In one of my conversations with a programmer at Facebook, he said that no komi would certainly cause problems about the reliability of evaluations but he felt that for the early part of the game it was not likely to make much difference. I won't say which programmer it was in case he wants to change his mind about that.

Author:  Bill Spight [ Thu Feb 14, 2019 1:49 pm ]
Post subject:  Re: Elf OpenGo paper released

John Fairbairn wrote:
BTW In one of my conversations with a programmer at Facebook, he said that no komi would certainly cause problems about the reliability of evaluations but he felt that for the early part of the game it was not likely to make much difference. I won't say which programmer it was in case he wants to change his mind about that.


Well, human players changed their minds about :w3: with komi versus without. :)

Author:  bernds [ Thu Feb 14, 2019 4:57 pm ]
Post subject:  Re: Elf OpenGo paper released

Uberdude wrote:
- Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD

... and sadly, since SGF isn't sufficiently standardized, the annotations are only two numbers in the comments.

If you are on Linux, you can convert a file with the following command into something q5go can understand and produce a winrate graph for:
Code:
cat inputfile.sgf | sed   's,C\[\([.0-9]\+\)$,QLZV\[\1:,'  |tr -d '\n' |sed -e 's,US\[GoGoD,FG[257:]US[GoGoD,' -e 's,QLZV.\([0-9.]\+\):\([0-9.]\+\)],\nQLZV[\2:\1],g' >outputfile.sgf


This isn't perfect, ideally you'd want to mark the variations as figures, but I don't really see a way to do this from the command line. I have some local changes to automatically mark figures and diagrams but that isn't quite ready to be pushed yet...

Author:  ez4u [ Thu Feb 14, 2019 8:14 pm ]
Post subject:  Re: Elf OpenGo paper released

The analysis tool is interesting. However, the readme file contains the following statement.
"... Importantly, you can see humanity's improvement in the game in 2016, when Go AIs came onto the scene and taught humans to play at a higher level. Also notice the harm that the large historical event of WWII did to the game..." [emphasis added]
This is hilarious if you look carefully at the graph. The big dip does not coincide with WWII. It is the New Fuseki Era that caused the "harm". :)

Author:  hyperpape [ Fri Feb 15, 2019 7:37 am ]
Post subject:  Re: Elf OpenGo paper released

This suggests an interesting, albeit vague, question: is there a way to assess whether humans learned something in 2016, filtering out the "easy moves"?

What I mean by "easy moves" is that if a move appears in a fuseki that was played by AlphaGo/LZ/ELF, and I copy it, I have "played better", but who cares? Once we're out of my opening book, I may or may not continue to make the moves the AI will approve of. I think it only makes sense to say I've learned if my moves are better in cases where I'm not just copying.[0]

Filtering the "opening book" would be an easy task, but it's probably not adequate. There are local patterns that also can be copied, and fusekis that differ only minimally from one that in an opening book. What we are really after is the use of those patterns in cases that require judgment about what the patterns accomplish.

By the way, when I say "just copying", I mean that to be a pretty low bar. I don't mean to say professionals must have an elaborate theory for why a new move works. Just that there has to be that level of judgment--even if the player is saying "where would the AI play?", that has to be a question, rather than coming straight from memory.

Anyway, I think the answer is probably yes. From commentaries, I get the feeling that professionals have changed more than just rote copying of the AI moves. However, I wonder if there's a way to measure it.

[0] Well, if I'm a professional--if I'm me, we know the answer is that I won't.

Author:  Kirby [ Fri Feb 15, 2019 11:34 am ]
Post subject:  Re: Elf OpenGo paper released

hyperpape wrote:
This suggests an interesting, albeit vague, question: is there a way to assess whether humans learned something in 2016, filtering out the "easy moves"?


The best I can think of is to measure how similar a given player is in their decision making to that of a particular version of a bot, e.g., by measuring the average and variation of changes in expected winning percentages for that player's moves. This is a heuristic, but not a definite answer, since a future version of a given bot may end up with a different idea of what's good and bad.

There are other problems, too. If a bot says that your move drops the winning percentage by 10%... What does it really mean about what you've learned? Sometimes part of learning is playing worse first, before you can play better. You can learn why the move you're experimenting with is bad, for example.

Probably still the most accurate way to track progress is to measure how often you win against a given level of opponent over time, though, that also has its problems. You may get better at winning against 5d player A, but not get better at winning against 5d player B...

Tough stuff... ¯\_(ツ)_/¯

Author:  Calvin Clark [ Fri Feb 15, 2019 12:31 pm ]
Post subject:  Re: Elf OpenGo paper released

ez4u wrote:
This is hilarious if you look carefully at the graph. The big dip does not coincide with WWII. It is the New Fuseki Era that caused the "harm". :)


Brilliant!

But I'm also curious what happened in 1980, where there is a spike in "bad moves", "very bad moves" *, etc. even into the third set of 60 moves into the game. Komi change? Or did ELF just dislike Chinese players? Some of these may be artifacts of the kinds of games that were available to collect in GoGoD at the time. I'd be interested in John's view on that phenomenon.

* This definition is tricky. First, a human probability is not the same as an AI one. Second, attempts to do this crudely in chess unfairly punish more tactical players, because go strength is not just about making fewer mistakes but also about provoking your opponent to make bigger ones. The only thing that's really a mistake is going from a winning position to losing one, but that naturally happens when some strong players take the game out into a chaotic street melee as they are wont to do. Third, as Bill Spight has pointed out, the networks are trained to win, not to evaluate.

But it's fun to have the data, so thanks to the ELF OpenGo team for sharing!

Author:  Uberdude [ Fri Feb 15, 2019 12:43 pm ]
Post subject:  Re: Elf OpenGo paper released

The Elf win % drop and other metrics explorer is interesting, but there's a lot of caveats. For example, here is Elfv2's winrate from a recent tournament game of me (4d EGF) vs a 1d EGF (who used to be 4d BGA). He made me think with some tough fighting, but according to Elf I made only 1 significant mistake over 10% winrate drop (and Elf gives quite big swings), and once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.
Attachment:
Simons Wall elfv2.PNG
Simons Wall elfv2.PNG [ 173.48 KiB | Viewed 8327 times ]


For comparison, here's Iyama Yuta 9p vs Yamashita 9p's recent Kisei game winrate from Elf. Loads of mistakes from both all over the place (about 10 >10% each). I wouldn't claim this means I played better than them in my game: I had a more mismatched game against an opponent who didn't challenge me so much so they were facing more difficult positions in which to find the best move than I was, and consequently doing worse at it. Also I expect pro games will tend to be more evenly matched than a 4d vs 1d, but still the phenomenon of one player going to 99% fairly quickly (which for Elf might just be a 5 point lead) and thus no space left for winrate variations will happen.
Attachment:
Iyama Yamashita elfv2.PNG
Iyama Yamashita elfv2.PNG [ 206.5 KiB | Viewed 8327 times ]


And to avoid the "Japanese players are weak" criticism, here's Shin Jinseo vs Gu Zihao, not as mistakes as Iyama, but still quite a few.
Attachment:
Gu vs Shin elfv2.PNG
Gu vs Shin elfv2.PNG [ 161.66 KiB | Viewed 8316 times ]

Author:  Uberdude [ Fri Feb 15, 2019 12:47 pm ]
Post subject:  Re: Elf OpenGo paper released

bernds wrote:
Uberdude wrote:
- Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD

... and sadly, since SGF isn't sufficiently standardized, the annotations are only two numbers in the comments.

Trying to open these SGF in Lizzie makes it hang!

Author:  dfan [ Fri Feb 15, 2019 1:07 pm ]
Post subject:  Re: Elf OpenGo paper released

Calvin Clark wrote:
This definition is tricky. First, a human probability is not the same as an AI one. Second, attempts to do this crudely in chess unfairly punish more tactical players, because go strength is not just about making fewer mistakes but also about provoking your opponent to make bigger ones.
Indeed, when researchers have evaluated historical chess players by having computers rate their moves, Capablanca comes out better than expected (not that he was a slouch in the first place), because his simple style meant that he had fewer opportunities to make mistakes, compared to, say, someone like Kasparov who played in a maximal dynamic style.

Author:  Bill Spight [ Fri Feb 15, 2019 1:34 pm ]
Post subject:  Re: Elf OpenGo paper released

Uberdude wrote:
once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.


That's why the log of the odds ratio is a more informative measure. :)

Author:  Uberdude [ Fri Feb 15, 2019 1:52 pm ]
Post subject:  Re: Elf OpenGo paper released

Bill Spight wrote:
Uberdude wrote:
once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.


That's why the log of the odds ratio is a more informative measure. :)


Even then I think the quality and precision of the bot's suggestions is reduced: if it says one move is 99.675 and another is 99.784 does can you really believe those sig figs? It just wants a safe win (0.5 points as seen from move 100 could be safe) and caa play slack moves whereas a human might want to keep pressing the advantage for a comfier margin. A better approach would be to be able to add in some dynamic komi to get the winrate back near 50% and then analyse from that modified board state. Unfortunately the Elf converted to LZ weights (at least v0 / v1) don't play nicely with the dynamic komi modified version of LZ engine.

Author:  Uberdude [ Fri Feb 15, 2019 3:59 pm ]
Post subject:  Re: Elf OpenGo paper released

What is the story behind this graph? It shows the averaged biggest mistake (in win % drop) for all players over time. There probably wasn't much data back in 1700, and then we have them making bigger mistakes to around 1780, then getting better down to a trough of mistakes around 1860 to 1895. Is this seen as a golden age of Japanese go, you've got the last few years of Shusaku to start and Shuei at the end, though I presume he indivudually was a small part of the corpus. Checking the stats for just him he averaged around 24%, quite a bit lower than the trough at 27%. Then we have mistakes getting bigger again at turn of the century, is this the collapse of the Go houses? As ez4u mentioned the peak in mistakes in the 1930s is the Shin Fuseki era BEFORE WW2. And in modern times the reduction in biggest mistakes seems quite nicely correlated with reducing time limits ;-)

Attachment:
Biggest mistake al players over time.PNG
Biggest mistake al players over time.PNG [ 36.97 KiB | Viewed 8267 times ]

Author:  John Fairbairn [ Sat Feb 16, 2019 2:59 am ]
Post subject:  Re: Elf OpenGo paper released

I'm not sure the above graph tells us much. My own background in statistics is not much more than reading books like Freakonomics on long-haul flights, but I have hugely more knowledge of how the database was constructed and what it contains. Both of those things tell me to treat the graph with a great amount of caution.

Among the factors that potentially affect the results are these. This is not a complete list but does run roughly chronologically.

1. The very early games include a large number of Chinese games under ancient rules. Apart from the fact that these usually start with 4 stones on the board, which restricts the style of play somewhat, and are without komi, they also have group tax, which may be a distorting element. But the single biggest distortion is that the dates of these games are usually unknown (and even the dates of the players can be quite unknown). I therefore catalogued the games under the date, or estimated date, of the publications in which I found the games. This means, for example, that there are very many games labelled 1700. That's not when they were played.

2. Old games in general are likewise affected by being with no komi and often with handicaps. Thee handicaps include not just stones but the series type of handicaps (e.g. taking Black in 2 games out of 3). Since a series handicap was defined by current grades only initially, but could then change between those two same players (not to mention that grades were set largely on the basis of politics) and not others, there must have been many cases where the wrong handicap was used. In general, too, no komi is not just a problem with training bots but I expect it also encourages White to make wilder moves, and thus bigger potential mistakes.

3. At times such as the late Edo/Meiji period in Japan, there were fewer games because of less sponsorship and other external factors. But also there may be gaps in the record. For example, I have not got round to doing the complete games of Shuwa yet.

4. At any period with the older players, the corpus is likely to focus on the star players, via their collected games. This means many games from very early in their careers. Nowadays the proportion of games by weaker players is likely to be much less because there are just so many games by strong players to collect instead.

5. In the 1930s, as has been observed, there was a spike that can be considered to coincide with Shin Fuseki. Intuitively, I suppose we would expect many mistakes then as players started experimenting. But there may be a further factor. That period has been of special interest to me and so over the past 2-3 years I have been adding lots of games from this era. These are generally by weaker players (more mistake prone? More experimental?) and so in this period we get both more data and also data covering a much wider range of players (e.g. the Oteai B Section) than in other historical eras. There is also a trend towards the use of komi in this era, but weird ones such 2.5 points.

6. As regards the war period, there is actually very little data. Apart from disruption from bombs, and players being sent off to fight, paper was scarce and publication of games was minimal (many now known were reconstructed later from personal records of the players).

7. I don't think time limits have much to do with anything in this graph. For a start, as one example, in the days of 13 hours each, there were distortions at both ends of the scale. On the one hand many players would use the extra time to take a shower, pop out to the shops or have a snooze upstairs. At the other end of the scale, where a player did use all his time, it became apparent that this carried significant health risks, presumably making mistakes likelier, and so time limits were reduced significantly without any external pressure from sponsors or public.

8. Since I get to choose what goes in the database and I don't like games at Mickey Mouse time limits, I have at times tended to ignore these (at any era).

9. If we look at very recent times, several factors leap out at me for initial consideration. One is that the proportion of Chinese and Korean players represented has increased very significantly. Whether you accept Cho Hun-hyeon's view that this has coincided with a horde of programmatically trained clones is up to you, but I think what is beyond question is that the level of play has not just increased but differences in both strength and styles between players have become smaller, so that there are much fewer mismatches (with their potential for bigger mistakes?) than in the past.

10. I suspect there is also a flattening effect in modern times due to increases in komi. And of course Elf is trained on current komi, so is perhaps more likely, on average, to find big mistakes in games which are not at 7.5 komi? (Or even to report as mistakes moves in games at other komis which were not really wrong?)

Author:  Bill Spight [ Sat Feb 16, 2019 8:41 am ]
Post subject:  Re: Elf OpenGo paper released

Uberdude wrote:
Bill Spight wrote:
Uberdude wrote:
once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.


That's why the log of the odds ratio is a more informative measure. :)


Even then I think the quality and precision of the bot's suggestions is reduced: if it says one move is 99.675 and another is 99.784 does can you really believe those sig figs?


Well, I don't really believe that win rates are win rates, anyway. ;) I have an open mind about that, except for Leela 11. Playing around with Deep Leela, it seems to me that the win rate estimates for the player who is ahead are underestimates, as I expected. As for the stronger bots, I can't say.

But if we take the log odds ratios we get 5.73 and 6.14, respectively, for a difference of 0.41. By comparison the log odds ratio for 60% is 0.41 and the log odds ratio for 50% is, OC, 0. So the play with a win rate of 99.675% instead of 99.784% could be just as bad a mistake as a play with a win rate of 50% instead of 60%. Quien sabe?

But, as you say, surely the errors are larger as we approach 100% or 0%.

Edit: Also, the use of changes in win rate estimates between moves instead of comparison of win rate estimates for the same possible moves introduces the complication that the estimates should approach 0 or 1 as the game continues. That's probably a small factor early on, but in the endgame it could be significant.

Author:  dfan [ Sat Feb 16, 2019 9:20 am ]
Post subject:  Re: Elf OpenGo paper released

Bill Spight wrote:
That's why the log of the odds ratio is a more informative measure.

By the way, the log of the odds ratio is what these networks actually produce under the hood. As people have noted here, you don't want to have to expend lots of energy making your network produce values bounded between 0 and 1 that precisely hit targets like 0.98 or 0.99. So instead you have the network produce an unbounded value, let's call it x, and then run x through the sigmoid function 1/(1 + e^(-x)) to produce a probability p between 0 and 1. Solving for x, you end up with x = log (p/(1-p)), which is the log odds.

Author:  Bill Spight [ Sat Feb 16, 2019 9:50 am ]
Post subject:  Re: Elf OpenGo paper released

dfan wrote:
Bill Spight wrote:
That's why the log of the odds ratio is a more informative measure.

By the way, the log of the odds ratio is what these networks actually produce under the hood.


Great minds think alike. :mrgreen:

Author:  smartgo [ Tue Feb 19, 2019 9:07 am ]
Post subject:  Re: Elf OpenGo paper released

If you want to look at ELF’s analysis of pro games but don’t want to download gigs of data, you can now download the annotated SGF for a specific game:
https://smartgo.com/gogod.html

Author:  And [ Wed Feb 20, 2019 10:13 am ]
Post subject:  Re: Elf OpenGo paper released

changed ELF OpenGo Windows binary https://dl.fbaipublicfiles.com/elfopeng ... ngo_v2.zip
I did not find the change log, but the size decreased to 1 GB and now the 43 version of Sabaki. In the previous version worked on my computer only elf_cpu_full, but compared to elf_v2 + lz it plays much weaker. The new version does not work at all. Has anyone tried ELF OpenGo Windows binary?

Page 2 of 3 All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/