It is currently Sun Jul 12, 2020 5:36 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 18 posts ] 
Author Message
Offline
 Post subject: MuZero beats AlphaZero
Post #1 Posted: Wed Nov 20, 2019 11:00 pm 
Lives in gote

Posts: 379
Liked others: 363
Was liked: 195
DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf

From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.

They tested it against AlphaZero for go and MuZero won, this is an exact quotation:

"In Go, MuZero slightly exceeded the performance of AlphaZero, despite using less computation per node in the search tree (16 residual blocks per evaluation in MuZero compared to 20 blocks in AlphaZero)"

Very interesting news, I hope they will publish some game records too!

_________________
Sorin - 361points.com


This post by sorin was liked by 3 people: Bonobo, EdLee, Waylon
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #2 Posted: Wed Nov 20, 2019 11:18 pm 
Honinbo

Posts: 9966
Liked others: 3248
Was liked: 3255
sorin wrote:
DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf

From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.


Actually, learning the rules is not innovative.

Quote:
Very interesting news, I hope they will publish some game records too!


Very interesting, indeed. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Banana Republic. It's not just a store anymore.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject:
Post #3 Posted: Thu Nov 21, 2019 12:37 am 
Honinbo
User avatar

Posts: 8792
Location: Santa Barbara, CA
Liked others: 336
Was liked: 2057
GD Posts: 312
Hi sorin, thanks.

Nice to see the classic Atari games.
Mr. Aja Huang (relayer in AlphaGo-LSD match) not listed in this paper.

Too bad the "casual" readers of these papers would have no idea of the etymology of Atari and its connection to Go. :scratch: (Unless they accidentally wikipedia it up.)

Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #4 Posted: Thu Nov 21, 2019 1:00 am 
Judan

Posts: 6495
Location: Cambridge, UK
Liked others: 388
Was liked: 3570
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
I don't think we ever got game records of AlphaZero for Go did we? Also AlphaZero was only stronger than the 20 block version of AlphaGo Zero (which was between AG Lee and AG Master), not the 40 block version, see https://lifein19x19.com/viewtopic.php?p=239589#p239589. So these games would be interesting to see from a "what style does this new bot from an independent training run of self discovery of rules have" perspective but will likely be weaker than AG0 40b.


This post by Uberdude was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #5 Posted: Thu Nov 21, 2019 2:34 am 
Judan

Posts: 6495
Location: Cambridge, UK
Liked others: 388
Was liked: 3570
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Now the real challenge for MuZero is can it play Mao?


This post by Uberdude was liked by: MikeKyle
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #6 Posted: Thu Nov 21, 2019 8:30 am 
Honinbo

Posts: 9077
Liked others: 1529
Was liked: 1559
KGS: Kirby
Tygem: 커비라고해
Next step: AI to decide to play go when it doesn’t know the rules, and also doesn’t know it can use board or stones.

_________________
be immersed


This post by Kirby was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #7 Posted: Thu Nov 21, 2019 8:40 am 
Dies in gote

Posts: 43
Liked others: 2
Was liked: 11
Yes, we should eagerly anticipate the day that the AI learns Go out of sheer interest


This post by Yakago was liked by: Bonobo
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #8 Posted: Thu Nov 21, 2019 8:44 am 
Gosei

Posts: 1542
Location: Earth
Liked others: 578
Was liked: 250
Mu Zero, can you tell us more about Go?

I don't care. I just win.

Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #9 Posted: Thu Nov 21, 2019 9:44 am 
Honinbo

Posts: 9966
Liked others: 3248
Was liked: 3255
Gomoto wrote:
Mu Zero, can you tell us more about Go?


Mu.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Banana Republic. It's not just a store anymore.

Everything with love. Stay safe.


This post by Bill Spight was liked by 3 people: Bonobo, Gomoto, TelegraphGo
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #10 Posted: Thu Nov 21, 2019 4:51 pm 
Lives with ko
User avatar

Posts: 190
Liked others: 38
Was liked: 27
Rank: EGF 2k
KGS: MKyle
Uberdude wrote:
Now the real challenge for MuZero is can it play Mao?


I played Mao in college and genuinely thought it was just made up by a small group of bored Yorkshiremen.
I guess it's your point, but Mau is kind of the only game Muzero seems to play.

Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #11 Posted: Fri Nov 22, 2019 10:20 am 
Lives in sente

Posts: 727
Liked others: 44
Was liked: 218
GD Posts: 10
Surprised not to see Aja Huang in this, but he appears in AlphaStar paper.

Anyway, I just love the name, Zero is nothing, and Mu is also nothing in Japanese and Korean (Wu in Chinese), something like that.

I'm wondering if they manage to also play StarCraft at AlphaStar level in their next project, the AI name could be MuZeroNova, Nova is 'new' in Latin and also 'star explosion' in astronomical term. Though I might consider adding another 'nothing' in the name if the AI manage to win even without being tasked to win/winning reward.

Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #12 Posted: Fri Nov 22, 2019 10:45 am 
Lives in sente
User avatar

Posts: 928
Liked others: 80
Was liked: 331
For the next name of a Deepmind product, I suggest EpsilonZero (vacuum permittivity).


This post by jlt was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject:
Post #13 Posted: Fri Nov 22, 2019 12:11 pm 
Honinbo
User avatar

Posts: 8792
Location: Santa Barbara, CA
Liked others: 336
Was liked: 2057
GD Posts: 312
μ

Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #14 Posted: Fri Nov 22, 2019 2:38 pm 
Honinbo

Posts: 9966
Liked others: 3248
Was liked: 3255

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Banana Republic. It's not just a store anymore.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #15 Posted: Fri Nov 22, 2019 2:41 pm 
Lives in gote

Posts: 379
Liked others: 363
Was liked: 195
Bill Spight wrote:
sorin wrote:
DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf

From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.


Actually, learning the rules is not innovative.



Right. And this is not about "learning the rules", but learning to act in an environment where there are no clear rules.

They used it for go as well just as proof-of-concept I guess, but go (or board games in general) is not the main target for this family of algorithms. Nevertheless, I think it's very cool, I am mostly interested about the learning trajectory for go, whether it ended up learning in a different way, or did it converge to AlphaZero style, etc.

_________________
Sorin - 361points.com


This post by sorin was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #16 Posted: Fri Nov 22, 2019 9:17 pm 
Lives in sente

Posts: 727
Liked others: 44
Was liked: 218
GD Posts: 10
Since DeepMind is not gonna provide exact elo value anyway I'll do this for fun. I try to find elo from graphs assuming graphs have accurate scale.

We'll start with the exact number the paper mention (from AlphaGo Zero paper)
3,144 for AlphaGo Fan
3,739 for AlphaGo Lee
4,858 for AlphaGo Master
AlphaGo Zero (40 blocks/ 40 days) 5,185

Now estimated number
AlphaGo Zero (20 blocks/ 3 days) 4,884 (from AlphaZero paper)
AlphaZero (20 blocks/ 13 days) 4987 (from MuZero paper), 4980 (from AlphaZero paper), very similar number across these two papers so I think they have accurate scale graphs
MuZero (16 blocks/ 12 hours?) 5161 (from MuZero paper)

Though there is a very BIG caution, they're different match condition, in MuZero paper the condition is 800 simulations per move, and in other graph shows that MuZero is able to outperform AlphaZero from 0.1 seconds to 20 seconds per move, at 20 to 50 seconds per move AlphaZero outperform MuZero, and we don't know what will happen at even longer thinking time.


This post by pookpooi was liked by: sorin
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #17 Posted: Fri Nov 22, 2019 9:57 pm 
Lives in gote

Posts: 531
Liked others: 87
Was liked: 566
Rank: maybe 2d
Actually we know what will happen at longer thinking times - it's almost guaranteed that AlphaZero continues to pull further ahead of MuZero.

The reason that AlphaZero pulls ahead at longer thinking times is because the accuracy of MuZero's representation of the board degrades the more times it passes through the dynamics function, so as it thinks more and more moves ahead, its "mental picture" of the future board state becomes worse and worse until it degrades into garbage. (This is a general phenomenon that afflicts all known RNN-style architectures that attempt to model any kind of state dynamics.)

The paper itself remarks on the fact that quite amazingly, the degradation only really noticeably starts at least a whole order of magnitude beyond what was used in self-play training. But for deep searches, as it currently is, it can't compete with AlphaZero, which has an actual software implementation of a Go board to make the moves on and therefore perfect future board perception.

(As others have mentioned, it's very clear from features of the design like this one that Go wasn't really the target problem being solved here, they're focused on more general tasks where you can't simply implement the rules of the game in your model).


This post by lightvector was liked by 4 people: Bill Spight, dfan, ez4u, hyperpape
Top
 Profile  
 
Offline
 Post subject: Re: MuZero beats AlphaZero
Post #18 Posted: Fri Nov 22, 2019 10:39 pm 
Honinbo

Posts: 9966
Liked others: 3248
Was liked: 3255
Thanks, lightvector. :)

One thing that keeps coming to my mind is Richard Feynman's caution about extrapolation. OC, everybody knows that you can't trust extrapolation, but Feynman pointed out that your can't trust extreme data points, either. They are not validated by further exploration. See the horizan effect.

That's why, when I see long variations produced by analytical programs, I cringe. The Elf commentaries sometimes produces long variations, as well, but they cut them off when the number of visits or playouts drops below 1500. You can't trust moves that have not been explored at least that much.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Banana Republic. It's not just a store anymore.

Everything with love. Stay safe.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 18 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group