It is currently Thu Mar 28, 2024 12:24 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2  Next
Author Message
Offline
 Post subject: AlphaZero paper published in journal Science
Post #1 Posted: Thu Dec 06, 2018 2:01 pm 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
After about a year since the pre-print appeared in arkiv (AlphaZero L19 thread, not the same as AlphaGo Zero L19 thread), the AlphaZero paper has finally passed peer review and is in the journal Science:
http://science.sciencemag.org/content/362/6419/1140
pdf: http://science.sciencemag.org/content/s ... 0.full.pdf

Focus seems to be on chess and shogi. There's a new match vs stockfish, hopefully a better test than the last one. Chess media report: https://www.chess.com/news/view/updated ... game-match

DeepMind article and video:

Supplementary materials includes some Shogi games too which is something that community were missing:
http://science.sciencemag.org/content/s ... ver-SM.pdf


This post by Uberdude was liked by 3 people: Charlie, Elom, Waylon
Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #2 Posted: Fri Dec 07, 2018 2:16 am 
Lives in sente

Posts: 727
Liked others: 44
Was liked: 218
GD Posts: 10
So does this wrap up AlphaZero for good now? Hardly. As Demis Hassabis was so ready to point out recently, a new AlphaZero has been developed that is stronger than the one referenced in the paper. Be ready for new announcements!

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #3 Posted: Fri Dec 07, 2018 3:37 am 
Gosei

Posts: 1494
Liked others: 111
Was liked: 315
For a few moments I was convinced that their chess board was the wrong way around. Then I decided, no, they've just had a very active game.

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #4 Posted: Fri Dec 07, 2018 8:33 am 
Dies with sente

Posts: 111
Liked others: 9
Was liked: 23
Hmm

Looking at the graphs shows that komi is too large!

AlphaZero wins 68.9% of games as White against AlphaGo Zero and 53.7% as Black...

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #5 Posted: Fri Dec 07, 2018 9:04 am 
Beginner

Posts: 16
Liked others: 11
Was liked: 2
Rank: DDK
KGS: 11 kyu
IGS: 14 kyu
OGS: 11 kyu
Universal go server handle: jonsa
mumps wrote:
Hmm

Looking at the graphs shows that komi is too large!

AlphaZero wins 68.9% of games as White against AlphaGo Zero and 53.7% as Black...


Yeah, I was also thinking something along those lines. An "unusual" discrepancy.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #6 Posted: Fri Dec 07, 2018 9:09 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
mumps wrote:
Hmm

Looking at the graphs shows that komi is too large!

AlphaZero wins 68.9% of games as White against AlphaGo Zero and 53.7% as Black...


Well yes, all the bots (except from Elf v1) and quite a few top pros too (even before AI) think 7.5 komi gives white a slight advantage (53% according to AG Teach). But that's not exactly the same as saying it's too much as in something else would be better, because maybe if reduced to 6.5 that would give black more of an advantage (e.g. 55%).

For the Go version of AlphaZero it's not immediately obvious, but after careful reading of the paper, see below, I'm pretty sure it's 'only' the fully-trained 20-block AlphaGo Zero which AlphaZero beat 61% overall (and they report it beating the weaker AlphaGo Lee version before that, but still taking longer than vs Stockfish/Elmo). So by not having any AlphaZero Go games we aren't missing out on some new even stronger bot games that we have already, though it would be nice to see another instance of a strong bot's play learning from scratch to see if it ended up playing a similar style to AlphaGo Zero, LeelaZero, Elf OpenGo etc.

Science paper wrote:
We trained separate instances of AlphaZero for chess, shogi, and Go. Training proceeded for 700,000 steps (in mini-batches of 4096 training positions).
In chess, AlphaZero first outperformed Stockfish after just 4 hours (300,000 steps); in shogi, AlphaZero first outperformed Elmo after 2 hours (110,000 steps); and in Go, AlphaZero first outperformed AlphaGo Lee (9) after 30 hours (74,000 steps).

So to beat AlphaGo Lee, which is pretty weak by Go bot standards these days, it still took longer to train than the chess and shogi versions (a training step for Go was obviously slower, presumably because it's a bigger board). Then:
Quote:
The Go match was played against the previously published version of AlphaGo Zero [also trained for 700,000 steps (footnote 25 = AlphaGo Zero was ultimately trained for 3.1 million steps over 40 days.)]. <snip> In Go, AlphaZero defeated AlphaGo Zero, winning 61% of games.

From the AlphaGo Zero paper, the 20-block version was trained for a total of 700k steps aka mini-batches (of 2048 positions, cf AlphaZero's 4096) over a total 4.9 million self-play games. They then made the 40-block version which was trained, from scratch, over 3.1 million batches (of 2048 positions again) with 29 million games of self-play (LeelaZero is currently 40 block at 11 million self-play games (over increase # blocks), with bootstrapping of increasing network sizes). So my reading of this is that A0 beat the fully-trained 20-block version (which is stronger than AG Lee but weaker than AG Master), but not the 40-block version. Beating AG0 20-block by only 61%, which is around 4350 Elo on their graphs, means I think A0 is weaker than AG Master (4858) and AG0 40b (5185).

Science figure 2 caption wrote:
Tournament evaluation of AlphaZero in chess, shogi, and Go in matches against, respectively, Stockfish, Elmo, and the previously published version of AlphaGo Zero (AG0) that was trained for 3 days


Using the DeepMind Elo scale which is an extension of goratings.org we have:
Code:
Player                         Elo      Matches
Fan Hui                       ~3000
AlphaGo Fan                    3144    Beat Fan Hui 5-0 
Lee Sedol / top human         ~3600
AlphaGo Lee                    3739    Beat Lee Sedol 4-1   
AlphaGoZero 20b                4350    Beat AG Lee 100-0
AlphaZero                     ~4500    Beat AG0 20b 61% (over 1000 games?)
AlphaGo Master                 4858    Beat top pros online 60-0
AlphaGo Zero 40b               5185    Beat AG Master 89-11       

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #7 Posted: Fri Dec 07, 2018 10:35 am 
Oza

Posts: 3647
Liked others: 20
Was liked: 4626
Top human 3600 to top AI 5185 seems like an enormous gap.

What would you say that means in handicap terms?

If we say the range from Fan Hui 2d at 3000 to Yi Se-tol (obviously more than 9d) at 3600 is close to 3 stones (maybe too generous but I'd find it hard to believe it's not more than 2 stones), we get 1 pro da = 200 Elo. So the latest AI should give the top human about 9 stones???? Even halving the figures to give a handicap of 4.5 stones seems a stretch, but I wouldn't rule that out.

Do the top bots still play so as to win by half a point rather than by as much as possible? If so, can that behaviour be easily modified so that the bot will try to maximise the score. That would give us a way to compare humans more directly (i.e. by playing only even human-AI games, telling the bot the komi is 7.5 and telling the human the real komi is 40 points or whatever).

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #8 Posted: Fri Dec 07, 2018 10:59 am 
Gosei

Posts: 1590
Liked others: 886
Was liked: 527
Rank: AGA 3k Fox 3d
GD Posts: 61
KGS: dfan
John Fairbairn wrote:
Do the top bots still play so as to win by half a point rather than by as much as possible?

They play so as to maximize the probability that they will win by at least half a point.

Quote:
If so, can that behaviour be easily modified so that the bot will try to maximise the score.

People are still working on it. One problem is that at some point you have to make a tradeoff and say, for example , "I am willing for my chance of winning to go down from 98% to 97% in return for winning by 10.5 points instead of 0.5". Due to the nature of the playing system, there's no good way to say "I have a 100% chance of winning, and now I want to maximize my score while retaining that 100% chance", although of course that statement is logically meaningful.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #9 Posted: Fri Dec 07, 2018 11:40 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
I wouldn't try to convert those Elo differences to handicap, it's like converting apples to volts. To take the example of LeelaZero vs Haylee a while ago (a bit weaker than Fan Hui I suppose), it absolutely demolished her on even and 2 stones, in a manner that if a human (e.g. Lee Sedol) did that I'd expect her to lose on 3 stones too, but she won easily on 3 with LZ going silly.


This post by Uberdude was liked by 2 people: Bill Spight, Charlie
Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #10 Posted: Fri Dec 07, 2018 12:08 pm 
Gosei
User avatar

Posts: 1753
Liked others: 177
Was liked: 491
Note that the Elo rating does not vary linearly with handicap stones. Elo ratings are calculated in terms of winrate. God's Elo rating is infinite (well, not exactly but extremely high) , but cannot give 359 stones to a human.


Last edited by jlt on Wed Dec 12, 2018 7:40 am, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #11 Posted: Fri Dec 07, 2018 2:33 pm 
Lives in gote

Posts: 553
Liked others: 61
Was liked: 250
Rank: AGA 5 dan
I can think of one fairly simple way to gauge the strength of a computer program, relative to a human, expressed in meaningful units. Start playing an even game. The computer evaluates its winning chances after every move as usual. If and when the computer calculates that passing will still result in a likely win, the computer passes. At the end of the game, the computer probably wins by a small margin. The strength difference is the number of passes issued along the way. This scheme has the desirable feature that the computer is always playing the game it was trained to play, with no need to alter komi or introduce handicap stones.


This post by mitsun was liked by 2 people: Bill Spight, Waylon
Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #12 Posted: Fri Dec 07, 2018 4:01 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Interesting idea. :D

One possible problem is that, as the temperature drops, the odds that a pass by the computer will not affect who wins increases, so that the computer will probably pass more often in the endgame than in the opening. It is passes in the opening that approximate handicap stones. The number of passes under this scheme is likely not only to be greater than the number of handicap stones, it is likely to be more variable. Still, an interesting idea. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #13 Posted: Fri Dec 07, 2018 6:58 pm 
Oza
User avatar

Posts: 2401
Location: Tokyo, Japan
Liked others: 2338
Was liked: 1332
Rank: Jp 6 dan
KGS: ez4u
dfan wrote:
..."I am willing for my chance of winning to go down from 98% to 97% in return for winning by 10.5 points instead of 0.5". Due to the nature of the playing system, there's no good way to say "I have a 100% chance of winning, and now I want to maximize my score while retaining that 100% chance", although of course that statement is logically meaningful.

The statements may be logically meaningful but they are trivial. Isn't the real challenge to make sense of a statement like, "I have a 51% chance of winning by 0.5 points by playing X and a 49% chance of winning by 1.5 points by playing Y. I want to maximize my score; which should I choose?"

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #14 Posted: Sat Dec 08, 2018 4:15 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
ez4u wrote:
dfan wrote:
..."I am willing for my chance of winning to go down from 98% to 97% in return for winning by 10.5 points instead of 0.5". Due to the nature of the playing system, there's no good way to say "I have a 100% chance of winning, and now I want to maximize my score while retaining that 100% chance", although of course that statement is logically meaningful.

The statements may be logically meaningful but they are trivial. Isn't the real challenge to make sense of a statement like, "I have a 51% chance of winning by 0.5 points by playing X and a 49% chance of winning by 1.5 points by playing Y. I want to maximize my score; which should I choose?"


The thing is, amateur dans play the late endgame almost perfectly; but even pros do not play the late endgame perfectly. Under those circumstances, if it's a close call in the late endgame between going for a ½ pt. win versus going for a 1½ pt. win, the extra point gives a margin of safety. At least for humans.

But most, if not all, modern top bots do not assume nearly perfect play when they calculate winrates. And they do not estimate the margin of safety by expected scores, but by percentages.* As far as I can tell, the endgame, particularly the late endgame, is one of the places where humans play better than bots; life and death, semeai, and ladders being others. In all of these places, local reading can give the right global results. Bots excel at global reading, humans still excel at local reading.

* Edit: That's not right, is it? Modern top bots do not actually estimate the margin of safety, do they?

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


Last edited by Bill Spight on Sat Dec 08, 2018 4:33 pm, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #15 Posted: Sat Dec 08, 2018 5:17 am 
Oza
User avatar

Posts: 2401
Location: Tokyo, Japan
Liked others: 2338
Was liked: 1332
Rank: Jp 6 dan
KGS: ez4u
Bill Spight wrote:
ez4u wrote:
dfan wrote:
..."I am willing for my chance of winning to go down from 98% to 97% in return for winning by 10.5 points instead of 0.5". Due to the nature of the playing system, there's no good way to say "I have a 100% chance of winning, and now I want to maximize my score while retaining that 100% chance", although of course that statement is logically meaningful.

The statements may be logically meaningful but they are trivial. Isn't the real challenge to make sense of a statement like, "I have a 51% chance of winning by 0.5 points by playing X and a 49% chance of winning by 1.5 points by playing Y. I want to maximize my score; which should I choose?"


The thing is, amateur dans play the late endgame almost perfectly; but even pros do not play the late endgame perfectly. Under those circumstances, if it's a close call in the late endgame between going for a ½ pt. win versus going for a 1½ pt. win, the extra point gives a margin of safety. At least for humans.

But most, if not all, modern top bots do not assume nearly perfect play when they calculate winrates. And they do not estimate the margin of safety by expected scores, but by percentages. As far as I can tell, the endgame, particularly the late endgame, is one of the places where humans play better than bots; life and death, semeai, and ladders being others. In all of these places, local reading can give the right global results. Bots excel at global reading, humans still excel at local reading.

If the discussion is about switching from a winrate strategy to a maximum point strategy, then the starting point is the fuseki not the late endgame.

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #16 Posted: Sat Dec 08, 2018 8:54 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
dfan wrote:
..."I am willing for my chance of winning to go down from 98% to 97% in return for winning by 10.5 points instead of 0.5". Due to the nature of the playing system, there's no good way to say "I have a 100% chance of winning, and now I want to maximize my score while retaining that 100% chance", although of course that statement is logically meaningful.
ez4u wrote:
Bill Spight wrote:
ez4u wrote:
The statements may be logically meaningful but they are trivial. Isn't the real challenge to make sense of a statement like, "I have a 51% chance of winning by 0.5 points by playing X and a 49% chance of winning by 1.5 points by playing Y. I want to maximize my score; which should I choose?"


The thing is, amateur dans play the late endgame almost perfectly; but even pros do not play the late endgame perfectly. Under those circumstances, if it's a close call in the late endgame between going for a ½ pt. win versus going for a 1½ pt. win, the extra point gives a margin of safety. At least for humans.

But most, if not all, modern top bots do not assume nearly perfect play when they calculate winrates. And they do not estimate the margin of safety by expected scores, but by percentages. As far as I can tell, the endgame, particularly the late endgame, is one of the places where humans play better than bots; life and death, semeai, and ladders being others. In all of these places, local reading can give the right global results. Bots excel at global reading, humans still excel at local reading.

If the discussion is about switching from a winrate strategy to a maximum point strategy, then the starting point is the fuseki not the late endgame.


To take your example, in general, in the opening the difference between estimated winrates of 51% and 49% is more indicative of the chances of winning than the difference between estimated results of 1½ pts. and ½ pt. But in the late endgame I think that the difference between estimated results (by current human pros) of 1½ pts. and ½ pt. is more indicative of the chances of winning than the difference between estimated winrates (by current top bots) of 51% and 49%. The reason lies in the reduction of the uncertainty in estimated point scores as the game goes on. Currently the uncertainty of estimated point scores is so great in the opening that no pros that I know of even attempt to estimate them. (The traditional approach is to estimate locally secure territory and to use that as one factor to consider.)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #17 Posted: Sat Dec 08, 2018 1:19 pm 
Dies in gote

Posts: 22
Location: Niger, West Africa
Liked others: 13
Was liked: 4
Rank: KGS 8 kyu
KGS: Seberle 8k
DGS: Seberle 7k
OGS: Seberle
Online playing schedule: KGS or OGS around 04:00-06:00 UTC most days
I just finished reading the AlphaZero paper, which was fascinating. I have a couple of questions, if anyone happens to know more.

On page 2, they explain that each move is selected "either proportionally (for exploration) or greedily (for exploitation) with respect to the visit counts at the root state." What does that mean in layman's terms?

It's interesting that they abandoned symmetry because chess and shogi don't have symmetric boards. I wonder if AlphaZero has any idiosyncrasies, such as preferring a certain joseki in one corner, but a different variation in another corner. Did anyone read the supplemental material? Do they mention anything like this?

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #18 Posted: Sat Dec 08, 2018 3:55 pm 
Gosei

Posts: 1590
Liked others: 886
Was liked: 527
Rank: AGA 3k Fox 3d
GD Posts: 61
KGS: dfan
seberle wrote:
On page 2, they explain that each move is selected "either proportionally (for exploration) or greedily (for exploitation) with respect to the visit counts at the root state." What does that mean in layman's terms?

Say that when deciding on its next move, it has considered 500 variations starting with move A, 300 variations starting with move B, and 200 starting with move A. (In general, it tries to look more at moves that look more promising, for obvious reasons.)

In the proportional case (this is "temperature = 1", if you see it elsewhere), it would pick move A with 50% probability, move B with 30% probability, and move C with 20% probability, proportionally to their visit counts. This emphasizes exploration, and is done early in self-play games to generate a varied data set and make sure it tries lots of ideas and doesn't get stuck in its learning.

In the greedy case (this is "temperature = 0"), it would pick move A all of the time. This emphasizes exploitation, and is what you do in competition when you want to play your best.


This post by dfan was liked by 3 people: seberle, sorin, Waylon
Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #19 Posted: Sat Dec 08, 2018 10:33 pm 
Dies in gote

Posts: 22
Location: Niger, West Africa
Liked others: 13
Was liked: 4
Rank: KGS 8 kyu
KGS: Seberle 8k
DGS: Seberle 7k
OGS: Seberle
Online playing schedule: KGS or OGS around 04:00-06:00 UTC most days
Uberdude wrote:
I wouldn't try to convert those Elo differences to handicap, it's like converting apples to volts. To take the example of LeelaZero vs Haylee a while ago (a bit weaker than Fan Hui I suppose), it absolutely demolished her on even and 2 stones, in a manner that if a human (e.g. Lee Sedol) did that I'd expect her to lose on 3 stones too, but she won easily on 3 with LZ going silly.


Two questions for anybody:

So what do people say is the proper handicap between top pros and perfect play? I remember before AlphaGo I read that some pros thought that the top players would need no more than a 4 stone handicap against "God". Is that still what some think?

If Elo can't be converted to handicap at these high ranks, how do you determine handicap from Elo? Or can you? At what rank does the rule "100 Elo points = 1 rank" begin to break down?

Top
 Profile  
 
Offline
 Post subject: Re: AlphaZero paper published in journal Science
Post #20 Posted: Sun Dec 09, 2018 4:42 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
seberle wrote:
So what do people say is the proper handicap between top pros and perfect play? I remember before AlphaGo I read that some pros thought that the top players would need no more than a 4 stone handicap against "God". Is that still what some think?
There may be a stone or two uncertainity here, but it seems obvious that >3 and <9 stones are necessary. It is just hard to imagine a top pro losing at 9 stones, the board is simply not big and the game not long enough.

Quote:
If Elo can't be converted to handicap at these high ranks, how do you determine handicap from Elo? Or can you? At what rank does the rule "100 Elo points = 1 rank" begin to break down?
You may look at W's avg winrate at each strength level to get an idea. Since we can guess that fair komi is 7, the significance of the half point (slightly more with imperfect play) advantage also hints about the significance of one handicap stone at that level. It worths a few % at amateur levels, few % more at top pro levels, even more at top bot levels, and 100% at perfect level.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group