MuZero beats AlphaZero

For discussing go computing, software announcements, etc.
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

MuZero beats AlphaZero

Post by sorin »

DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf

From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.

They tested it against AlphaZero for go and MuZero won, this is an exact quotation:

"In Go, MuZero slightly exceeded the performance of AlphaZero, despite using less computation per node in the search tree (16 residual blocks per evaluation in MuZero compared to 20 blocks in AlphaZero)"

Very interesting news, I hope they will publish some game records too!
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: MuZero beats AlphaZero

Post by Bill Spight »

sorin wrote:DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf

From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.
Actually, learning the rules is not innovative.
Very interesting news, I hope they will publish some game records too!
Very interesting, indeed. :)
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
User avatar
EdLee
Honinbo
Posts: 8859
Joined: Sat Apr 24, 2010 6:49 pm
GD Posts: 312
Location: Santa Barbara, CA
Has thanked: 349 times
Been thanked: 2070 times

Post by EdLee »

Hi sorin, thanks.

Nice to see the classic Atari games.
Mr. Aja Huang (relayer in AlphaGo-LSD match) not listed in this paper.

Too bad the "casual" readers of these papers would have no idea of the etymology of Atari and its connection to Go. :scratch: (Unless they accidentally wikipedia it up.)
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: MuZero beats AlphaZero

Post by Uberdude »

I don't think we ever got game records of AlphaZero for Go did we? Also AlphaZero was only stronger than the 20 block version of AlphaGo Zero (which was between AG Lee and AG Master), not the 40 block version, see viewtopic.php?p=239589#p239589. So these games would be interesting to see from a "what style does this new bot from an independent training run of self discovery of rules have" perspective but will likely be weaker than AG0 40b.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: MuZero beats AlphaZero

Post by Uberdude »

Now the real challenge for MuZero is can it play Mao?
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: MuZero beats AlphaZero

Post by Kirby »

Next step: AI to decide to play go when it doesn’t know the rules, and also doesn’t know it can use board or stones.
be immersed
Yakago
Dies in gote
Posts: 53
Joined: Tue Jan 16, 2018 10:39 am
GD Posts: 0
Has thanked: 2 times
Been thanked: 12 times

Re: MuZero beats AlphaZero

Post by Yakago »

Yes, we should eagerly anticipate the day that the AI learns Go out of sheer interest
Gomoto
Gosei
Posts: 1733
Joined: Sun Nov 06, 2016 6:56 am
GD Posts: 0
Location: Earth
Has thanked: 621 times
Been thanked: 310 times

Re: MuZero beats AlphaZero

Post by Gomoto »

Mu Zero, can you tell us more about Go?

I don't care. I just win.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: MuZero beats AlphaZero

Post by Bill Spight »

Gomoto wrote:Mu Zero, can you tell us more about Go?
Mu.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
User avatar
MikeKyle
Lives with ko
Posts: 205
Joined: Wed Jul 26, 2017 2:27 am
Rank: EGF 2k
GD Posts: 0
KGS: MKyle
Has thanked: 49 times
Been thanked: 36 times

Re: MuZero beats AlphaZero

Post by MikeKyle »

Uberdude wrote:Now the real challenge for MuZero is can it play Mao?
I played Mao in college and genuinely thought it was just made up by a small group of bored Yorkshiremen.
I guess it's your point, but Mau is kind of the only game Muzero seems to play.
pookpooi
Lives in sente
Posts: 727
Joined: Sat Aug 21, 2010 12:26 pm
GD Posts: 10
Has thanked: 44 times
Been thanked: 218 times

Re: MuZero beats AlphaZero

Post by pookpooi »

Surprised not to see Aja Huang in this, but he appears in AlphaStar paper.

Anyway, I just love the name, Zero is nothing, and Mu is also nothing in Japanese and Korean (Wu in Chinese), something like that.

I'm wondering if they manage to also play StarCraft at AlphaStar level in their next project, the AI name could be MuZeroNova, Nova is 'new' in Latin and also 'star explosion' in astronomical term. Though I might consider adding another 'nothing' in the name if the AI manage to win even without being tasked to win/winning reward.
User avatar
jlt
Gosei
Posts: 1786
Joined: Wed Dec 14, 2016 3:59 am
GD Posts: 0
Has thanked: 185 times
Been thanked: 495 times

Re: MuZero beats AlphaZero

Post by jlt »

For the next name of a Deepmind product, I suggest EpsilonZero (vacuum permittivity).
User avatar
EdLee
Honinbo
Posts: 8859
Joined: Sat Apr 24, 2010 6:49 pm
GD Posts: 312
Location: Santa Barbara, CA
Has thanked: 349 times
Been thanked: 2070 times

Post by EdLee »

μ
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: MuZero beats AlphaZero

Post by Bill Spight »

The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

Re: MuZero beats AlphaZero

Post by sorin »

Bill Spight wrote:
sorin wrote:DeepMind published a papar about MuZero, a new approach to learning, which they evaluated on several board games and Atari video games: https://arxiv.org/pdf/1911.08265.pdf

From what I understand from a quick browse of the paper, the innovative part compared to AlphaZero type of approach is that MuZero doesn't "know" the rules in advance, therefore is a more general learning algorithm, which can be used in more open-ended domains.
Actually, learning the rules is not innovative.
Right. And this is not about "learning the rules", but learning to act in an environment where there are no clear rules.

They used it for go as well just as proof-of-concept I guess, but go (or board games in general) is not the main target for this family of algorithms. Nevertheless, I think it's very cool, I am mostly interested about the learning trajectory for go, whether it ended up learning in a different way, or did it converge to AlphaZero style, etc.
Post Reply