MuZero beats AlphaZero
-
sorin
- Lives in gote
- Posts: 389
- Joined: Wed Apr 21, 2010 9:14 pm
- Has thanked: 418 times
- Been thanked: 198 times
MuZero beats AlphaZero
DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf
From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.
They tested it against AlphaZero for go and MuZero won, this is an exact quotation:
"In Go, MuZero slightly exceeded the performance of AlphaZero, despite using less computation per node in the search tree (16 residual blocks per evaluation in MuZero compared to 20 blocks in AlphaZero)"
Very interesting news, I hope they will publish some game records too!
From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.
They tested it against AlphaZero for go and MuZero won, this is an exact quotation:
"In Go, MuZero slightly exceeded the performance of AlphaZero, despite using less computation per node in the search tree (16 residual blocks per evaluation in MuZero compared to 20 blocks in AlphaZero)"
Very interesting news, I hope they will publish some game records too!
Sorin - 361points.com
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: MuZero beats AlphaZero
Actually, learning the rules is not innovative.sorin wrote:DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf
From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.
Very interesting, indeed.Very interesting news, I hope they will publish some game records too!
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
- EdLee
- Honinbo
- Posts: 8859
- Joined: Sat Apr 24, 2010 6:49 pm
- GD Posts: 312
- Location: Santa Barbara, CA
- Has thanked: 349 times
- Been thanked: 2070 times
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: MuZero beats AlphaZero
I don't think we ever got game records of AlphaZero for Go did we? Also AlphaZero was only stronger than the 20 block version of AlphaGo Zero (which was between AG Lee and AG Master), not the 40 block version, see viewtopic.php?p=239589#p239589. So these games would be interesting to see from a "what style does this new bot from an independent training run of self discovery of rules have" perspective but will likely be weaker than AG0 40b.
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
-
Kirby
- Honinbo
- Posts: 9553
- Joined: Wed Feb 24, 2010 6:04 pm
- GD Posts: 0
- KGS: Kirby
- Tygem: 커비라고해
- Has thanked: 1583 times
- Been thanked: 1707 times
Re: MuZero beats AlphaZero
Next step: AI to decide to play go when it doesn’t know the rules, and also doesn’t know it can use board or stones.
be immersed
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: MuZero beats AlphaZero
Mu.Gomoto wrote:Mu Zero, can you tell us more about Go?
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
- MikeKyle
- Lives with ko
- Posts: 205
- Joined: Wed Jul 26, 2017 2:27 am
- Rank: EGF 2k
- GD Posts: 0
- KGS: MKyle
- Has thanked: 49 times
- Been thanked: 36 times
Re: MuZero beats AlphaZero
I played Mao in college and genuinely thought it was just made up by a small group of bored Yorkshiremen.Uberdude wrote:Now the real challenge for MuZero is can it play Mao?
I guess it's your point, but Mau is kind of the only game Muzero seems to play.
-
pookpooi
- Lives in sente
- Posts: 727
- Joined: Sat Aug 21, 2010 12:26 pm
- GD Posts: 10
- Has thanked: 44 times
- Been thanked: 218 times
Re: MuZero beats AlphaZero
Surprised not to see Aja Huang in this, but he appears in AlphaStar paper.
Anyway, I just love the name, Zero is nothing, and Mu is also nothing in Japanese and Korean (Wu in Chinese), something like that.
I'm wondering if they manage to also play StarCraft at AlphaStar level in their next project, the AI name could be MuZeroNova, Nova is 'new' in Latin and also 'star explosion' in astronomical term. Though I might consider adding another 'nothing' in the name if the AI manage to win even without being tasked to win/winning reward.
Anyway, I just love the name, Zero is nothing, and Mu is also nothing in Japanese and Korean (Wu in Chinese), something like that.
I'm wondering if they manage to also play StarCraft at AlphaStar level in their next project, the AI name could be MuZeroNova, Nova is 'new' in Latin and also 'star explosion' in astronomical term. Though I might consider adding another 'nothing' in the name if the AI manage to win even without being tasked to win/winning reward.
- jlt
- Gosei
- Posts: 1786
- Joined: Wed Dec 14, 2016 3:59 am
- GD Posts: 0
- Has thanked: 185 times
- Been thanked: 495 times
Re: MuZero beats AlphaZero
For the next name of a Deepmind product, I suggest EpsilonZero (vacuum permittivity).
- EdLee
- Honinbo
- Posts: 8859
- Joined: Sat Apr 24, 2010 6:49 pm
- GD Posts: 312
- Location: Santa Barbara, CA
- Has thanked: 349 times
- Been thanked: 2070 times
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: MuZero beats AlphaZero
無
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
sorin
- Lives in gote
- Posts: 389
- Joined: Wed Apr 21, 2010 9:14 pm
- Has thanked: 418 times
- Been thanked: 198 times
Re: MuZero beats AlphaZero
Right. And this is not about "learning the rules", but learning to act in an environment where there are no clear rules.Bill Spight wrote:Actually, learning the rules is not innovative.sorin wrote:DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf
From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.
They used it for go as well just as proof-of-concept I guess, but go (or board games in general) is not the main target for this family of algorithms. Nevertheless, I think it's very cool, I am mostly interested about the learning trajectory for go, whether it ended up learning in a different way, or did it converge to AlphaZero style, etc.
Sorin - 361points.com