Different Quality of Play between Current AI and AlphaGo 0

RobertJasiek · Post by **RobertJasiek** » Fri Sep 27, 2019 2:37 am

Although some current AI programs are said to be 2 stones stronger than top pro level. However, they are much weaker than AlphaGo Zero. E.g., Leela Zero (via AI Sensei) is described as having occasional problems with ladders and sekis, although not every weak or lazy user notices them. I have witnessed another occasional weakness: wrong LD status assessment. Therefore, never simply trust AI. When will current AI reach AlphaGo Zero level on typical computers?

Uberdude · Post by **Uberdude** » Fri Sep 27, 2019 4:09 am

To be fair we don't know how strong AG0 is on typical computers, just on high end hardware. The AlphaGo teaching tool (better Master not Zero) is on 10 million playouts, maybe it too made embarrassing ladder mistakes on 3000 playouts like LZ does and DeepMind wanted to hide that bad publicity. Does anyone know how many playouts Master was getting in the online series? I seem to recall that was on a single machine with a TPU or two in sub 30 seconds.

Mike Novack · Post by **Mike Novack** » Fri Sep 27, 2019 6:12 am

Well perhaps we should distinguish three, not jus two levels of computers.

a) Typical computers
b) high end gaming machines (probably a couple K to buy and maybe 300w to run)
c) high end hardware, equivalent of a mini supercomputer. Maybe even water cooled.

jann · Post by **jann** » Fri Sep 27, 2019 8:07 am

There were some claims on LZ discord that most of today's bots, including LZ and Minigo already reached AG0 levels (raw nets I guess, hw oc makes difference).

This question has some practical consequences since LZ for example is struggling to gain further strength at the moment, and it is unclear if this is already the limitation of 40b networks (which AG0 used as well), or a consequence of some other problem.

Bill Spight · Post by **Bill Spight** » Fri Sep 27, 2019 8:35 am

jann wrote:There were some claims on LZ discord that most of today's bots, including LZ and Minigo already reached AG0 levels (raw nets I guess, hw oc makes difference).

hw = hardware?

This question has some practical consequences since LZ for example is struggling to gain further strength at the moment, and it is unclear if this is already the limitation of 40b networks (which AG0 used as well), or a consequence of some other problem.

There is no guarantee that the fitness landscape for go is unimodal. Perhaps LZ has climbed the wrong hill. OC, in infinite time it will find its way to another hill.

jlt · Post by **jlt** » Fri Sep 27, 2019 9:25 am

LeelaZero has 16 million self-play games, while AlphaZero has 29 million, so Leelazero may be a bit less strong than AlphaZero, but is probably not very far.

How many playouts are necessary to avoid problems with ladders, life and death, semeais, etc.?

Bill Spight · Post by **Bill Spight** » Fri Sep 27, 2019 9:45 am

jlt wrote:LeelaZero has 16 million self-play games, while AlphaZero has 29 million, so Leelazero may be a bit less strong than AlphaZero, but is probably not very far.

There may be path dependency.

How many playouts are necessary to avoid problems with ladders, life and death, semeais, etc.?

Well, what do you mean by a problem? Certainly if human amateurs can do better with such things in some positions, that may be considered a problem. John Tromp has shown that ladders can pose very difficult problems, and even some "simple" ladder problems are well over 100 go moves deep. Similarly, Berlekamp and Wolfe composed endgame problems that are easy for humans who know the secret, but are almost 100 moves deep, and the candidate plays at each turn are very similar, so that it is unclear how much MCTS resembles brute force for such positions.

For the Elf commentaries, candidate plays, except for the actual game play, with fewer than 1500 playouts were not reported. If Elf's top choice had few competitors, it typically got more than 100k playouts. OC, that was not enough to avoid occasional reading errors where the human choice was demonstrably better.

jlt · Post by **jlt** » Fri Sep 27, 2019 9:53 am

Bill Spight wrote:
How many playouts are necessary to avoid problems with ladders, life and death, semeais, etc.?
Well, what do you mean by a problem?

Let me ask a more precise question. Are there examples of human games (pro or amateur) in which a recent version of LeelaZero with 100000 playouts chooses a wrong move because of a misread ladder, or life-and-death problem, or semeai?

If the answer is yes, then by which number should I replace 100000 so that the answer becomes "no"?

Gomoto · Post by **Gomoto** » Fri Sep 27, 2019 10:04 am

Therefore, never simply trust AI. When will current AI reach AlphaGo Zero level on typical computers?

I trust AI, even if it makes some mistakes from time to time. I even trust my fellow human beeings, even if they make some mistakes from time to time. The concept to only accept flawless results is inherently dangerous, allthough many people do not realize this, instead of the obvious examples in history and personal experiences.

It is better to be aware that mistakes will happen and react to them in proper ways.

(I acknowledge that the quote says never SIMPLY trust AI.)

jann · Post by **jann** » Fri Sep 27, 2019 10:22 am

Bill Spight wrote:hw = hardware?

Yes.

This question has some practical consequences since LZ for example is struggling to gain further strength at the moment, and it is unclear if this is already the limitation of 40b networks (which AG0 used as well), or a consequence of some other problem.
There is no guarantee that the fitness landscape for go is unimodal. Perhaps LZ has climbed the wrong hill.

In my experience random differences in end results are usually small with different training runs using identical parameters. Seemingly minor differences having unexpectedly large effect on the final net is more common. (In that case the above mentioned question: what did LZ do wrong?)

jlt wrote:LeelaZero has 16 million self-play games, while AlphaZero has 29 million, so Leelazero may be a bit less strong than AlphaZero, but is probably not very far.

It is not easy to compare LZ nets to AG0 nets since the published games not only use high playouts, but are also hand-selected (which may explain some brilliancies). But I doubt LZ would be near AG0 atm. Comparing selfplay numbers also unreliable since there ARE/were significant differences, so a factor of 2 in selfplay amount can easily slip left or right.

The question about ladders is even more difficult. There were some recent examples where LZ (on normal low-ish playouts) misplayed ladders with only a few disturbing stones in vicinity. Low and even mid playouts must rely on net guess about ladder to an extent, since actual playouts may not go along the ladder line for the player the net thinks the ladder is bad for (thus wont see its real outcome until very high playouts).

Bill Spight · Post by **Bill Spight** » Fri Sep 27, 2019 10:55 am

jann wrote:
jlt wrote:LeelaZero has 16 million self-play games, while AlphaZero has 29 million, so Leelazero may be a bit less strong than AlphaZero, but is probably not very far.
It is not easy to compare LZ nets to AG0 nets since the published games not only use high playouts, but are also hand-selected (which may explain some brilliancies).

Yes, I lost faith in the DeepMind team when they only published selected games, which made them useless for scientific purposes. Good PR, I suppose. You got the same kind of thing in chess. They knew better.

At least the Master vs. human games were not selected, and the Elf commentaries are comprehensive. (Different team.

)

Kirby · Post by **Kirby** » Fri Sep 27, 2019 6:12 pm

Bill Spight wrote:At least the Master vs. human games were not selected

Except that they only chose to reveal that Master was associated with AlphaGo *after* it showed an impressive result

Bill Spight · Post by **Bill Spight** » Fri Sep 27, 2019 6:42 pm

Kirby wrote:
Bill Spight wrote:At least the Master vs. human games were not selected
Except that they only chose to reveal that Master was associated with AlphaGo *after* it showed an impressive result

That's true.

But the games were still representative. No selection bias.

hydrogenpi7 · Post by **hydrogenpi7** » Sat Sep 28, 2019 6:49 pm

jann wrote:There were some claims on LZ discord that most of today's bots, including LZ and Minigo already reached AG0 levels (raw nets I guess, hw oc makes difference).

This question has some practical consequences since LZ for example is struggling to gain further strength at the moment, and it is unclear if this is already the limitation of 40b networks (which AG0 used as well), or a consequence of some other problem.

So using something like a Huawei Atlas 900 cluster and the latest LZ, that means it can beat the AlphaGoZero?

Would Huawei and Google we willing to do a match?

hydrogenpi7 · Post by **hydrogenpi7** » Sat Sep 28, 2019 6:51 pm

Bill Spight wrote:
jann wrote:
jlt wrote:LeelaZero has 16 million self-play games, while AlphaZero has 29 million, so Leelazero may be a bit less strong than AlphaZero, but is probably not very far.
It is not easy to compare LZ nets to AG0 nets since the published games not only use high playouts, but are also hand-selected (which may explain some brilliancies).
Yes, I lost faith in the DeepMind team when they only published selected games, which made them useless for scientific purposes. Good PR, I suppose. You got the same kind of thing in chess. They knew better.

At least the Master vs. human games were not selected, and the Elf commentaries are comprehensive. (Different team. )

also for what it is worth, had they played Lee with the Fan version of AG, good chance Lee would have won the match, had they played Ke Jie with the Lee version of AlphaGo, good chance Ke Jie might have won some games etc... I think they were conservative in that they wanted to assure wins at each step up the AI strength ladder, while in retrospect it seems clear the Fan version of AlphaGo was very weak and not "superhuman", certaintly current LZ is already past even the Lee version of AlphaGo

Life In 19x19

Different Quality of Play between Current AI and AlphaGo 0

Different Quality of Play between Current AI and AlphaGo 0

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo

Re: Different Quality of Play between Current AI and AlphaGo