Pros making progress against AI

John Fairbairn · Post by **John Fairbairn** » Thu Jul 27, 2017 9:06 am

There are signs that the young pros in the Japanese national training squad are beginning to get to grips with how to play against AI.

Starting at end June, they have so far played 102 games against DeepZen (the contract runs till end December) at decent time limits of 10 x 1 minute. Despite being trampled under foot at the beginning, they have now got the score up to a faintly respectable 18-84. Though the wins are still heavily outnumbered by losses, most of the wins have come in the latter part of this first month.

The individual scores are as follows, in no special order:

Iyama Yuta 0-2
Ichiriki Ryo 1-6
Fujisawa Rina
Matsuura Yuta 3-13
Murakawa Daisuke 0-1
Onishi Ryuhei 3-13
Mukai Chiaki 0-1
Seki Kotaro 0-6
Nishi Takenobu 1-3
Shibano Toramaru 2-9
Hirose Yuichi 2-6
Fujita Akihiko 0-2
Nishioka Masao 0-3
Hirata Tomoya 0-1
Yu Zhengqi 1-1
Otake Yu 1-0
Moro Arisa 0-1
Xie Yimin 0-1
Motoki Katsuya 0-1
Hoshiai Shiho 0-1
Ida Atsushi 1-1
Ueno Asami 1-0
Yahata Naoki 0-2
Inaba Karin 0-4
Nyu Eiko 0-2

If they are indeed learning the necessary skills, I have seen nothing to indicate what they might be or that the knowledge is being pooled. But FWIW almost all of DeepZen's losses occured by resigning in the move range 123-264. Only three losses have gone to a count and in two it lost by 0.5 as Black with over 300 moves, and in the other it lost by a whopping 51.5 after losing a big group quite early on (it seemed to think there was enough aji to play on). This obviously may suggest that fighting pays off more for humans than endgame attrition, but whether it also includes superior fuseki by the human in those cases is hard to tell.

As to the komi (6.5 here), there is again a hint that the current komis slightly favour White. DeepZen scored 41-7 as White but only 43-11 as Black.

moha · Post by **moha** » Fri Jul 28, 2017 6:33 am

I'm afraid these numbers are too low to draw conclusions. Variance of a few dozen samples is high, though OC it is also possible for humans to adjust and improve the results. In longer term there may be enough samples but the whole question will became moot

(AI will improve faster and further).

Bohdan · Post by **Bohdan** » Fri Jul 28, 2017 8:19 am

There is a limit for any improvement. The question will be if pros can find keys to that best version of AI. And I guess the will.

Kirby · Post by **Kirby** » Fri Jul 28, 2017 10:17 am

Bohdan wrote:There is a limit for any improvement. The question will be if pros can find keys to that best version of AI. And I guess the will.

Are you suggesting that limitations on improvement only apply to AIs? If humans are also limited in terms of improvement, what gives you confidence that pros will find some sort of keys to surpassing computers?

John Fairbairn · Post by **John Fairbairn** » Fri Jul 28, 2017 10:21 am

I'm afraid these numbers are too low to draw conclusions

I'm the last person to pontificate about numbers, but I don't think that is true. If it is, we can't draw conclusions from AG's 60-0 feat, either.

As I understand it, qualitative analyses typically require a smaller sample size than quantitative analyses. Hence see sample sizes typically around 30, sometimes as low as 20, in medical and social science research. What is being discussed here is qualitative, too, surely.

pookpooi · Post by **pookpooi** » Fri Jul 28, 2017 10:45 am

Isn't it a well-known phenomenon that if you play a specific bot many times, you will get stronger against it. Many KGS accounts get rank overestimated because their victories over bot.

What's interesting here is the balance between black and white in Japanese rule. As in Chinese rules it almost as if all pro admit that the komi is significantly favor white. Though I think that the people who have the power to change the komi must have all information in their hand, and they think 6.5 in JP, 7.5 in CN is already balance than any other number (without introducing tie).

moha · Post by **moha** » Fri Jul 28, 2017 10:59 am

John Fairbairn wrote:
I'm afraid these numbers are too low to draw conclusions
I'm the last person to pontificate about numbers, but I don't think that is true. If it is, we can't draw conclusions from AG's 60-0 feat, either.

I think the probability of 60-0 by coincidence is very low, while the probability of, say, 2-28 changing to 5-25 for another set by coincidence is significantly higher. I meant it is safer to draw conclusions where the outcome/difference exceeds variance, but OC I understand the urge to speculate even from the smallest sample sizes

.

Kirby · Post by **Kirby** » Fri Jul 28, 2017 11:01 am

John Fairbairn wrote:
I'm afraid these numbers are too low to draw conclusions
I'm the last person to pontificate about numbers, but I don't think that is true. If it is, we can't draw conclusions from AG's 60-0 feat, either.

As I understand it, qualitative analyses typically require a smaller sample size than quantitative analyses. Hence see sample sizes typically around 30, sometimes as low as 20, in medical and social science research. What is being discussed here is qualitative, too, surely.

Appropriate sample size depends on two factors: confidence level and margin of error. Stats don't really "prove" anything, but point to something within some level of confidence. I don't have time right now, but maybe tonight I can estimate the level of confidence you can have with conclusions drawn from this size of sample.

Bill Spight · Post by **Bill Spight** » Fri Jul 28, 2017 2:05 pm

John Fairbairn wrote:There are signs that the young pros in the Japanese national training squad are beginning to get to grips with how to play against AI.

Starting at end June, they have so far played 102 games against DeepZen (the contract runs till end December) at decent time limits of 10 x 1 minute. Despite being trampled under foot at the beginning, they have now got the score up to a faintly respectable 18-84. Though the wins are still heavily outnumbered by losses, most of the wins have come in the latter part of this first month.

pookpooi wrote:Isn't it a well-known phenomenon that if you play a specific bot many times, you will get stronger against it.

IMHO, pookpooi has a point. Although bots that rely upon randomness (such as Monte Carlo rollouts) are less vulnerable to that phenomenon than deterministic bots, it may well be that the Japanese pros are learning how to play against Deep Zen specifically. Whether they are learning how to play against, say, AlphaGo or other humans is another question.

moha wrote:I'm afraid these numbers are too low to draw conclusions

John Fairbairn wrote: I'm the last person to pontificate about numbers, but I don't think that is true. If it is, we can't draw conclusions from AG's 60-0 feat, either.

Well, you didn't give any numbers to support your hypothesis. For instance, take those pros who played more than one game against Deep Zen and divide their games into a first half and second half. (If they played an odd number of games, put the middle game into the first half.) Then we have before and after comparisons. Sum the wins and losses in each half over all the players involved. If they have been learning, there should be a significantly higher win rate in the overall second half than in the overall first half.

JohnFairbairn wrote:As I understand it, qualitative analyses typically require a smaller sample size than quantitative analyses. Hence see sample sizes typically around 30, sometimes as low as 20, in medical and social science research. What is being discussed here is qualitative, too, surely.

You happen to have hit one of my strong points. A prime example of qualitative analysis is the case study, which has an N of 1. Often studies include both qualitative analysis and quantitative analysis. What is being discussed here is not qualitative. For something qualitative, take pro game commentaries. For something quantitative, take my observation about the low incidence of pincers in AlphaGo self play games.

I have come up with a suggestion about why, but it has not and cannot be tested, unless AlphaGo releases more self play game records. I could test whether, say, Go Seigen or Lee Chang Ho made fewer pincers than other players, and what type of pincers they played, or compare the incidence of pincers in top level games in the 19th century vs. the 20th century. Etc., etc. The relative value of pincers would be difficult to prove.

Sorry for the digression. Whether pros are learning how to play against Deep Zen is a quantitative one. We should have plenty of data by the end of the year.

What they are learning is a qualitative question. Once we know that they are learning, we can ask them what they have learned.

As for low sample size in the social sciences, that is one reason for the replicability crisis in psychology. Feymann pointed that out long ago, but who listens?

aeb · Post by **aeb** » Fri Jul 28, 2017 2:36 pm

John Fairbairn wrote:There are signs that the young pros in the Japanese national training squad are beginning to get to grips with how to play against AI.
They have so far played 102 games ... score 18-84.
Mukai Chiaki 0-1 .. whopping 51.5
DeepZen scored 41-7 as White but only 43-11 as Black.

Yes, there is progress: 6% won in the first 32 games, 25% in the next 32, 20% in the last 54.
(I had 118 games instead of 102, score 21-97. For the games see here.)
I gave some statistics, but the numbers are still too small to conclude much.

aeb · Post by **aeb** » Fri Jul 28, 2017 4:27 pm

John Fairbairn wrote:Only three losses have gone to a count and in two it lost by 0.5 as Black with over 300 moves, and in the other it lost by a whopping 51.5 after losing a big group quite early on (it seemed to think there was enough aji to play on).

Shiraishi discusses what probably happened: DeepZenGo thought it had won, misreading the status of the seki in the bottom left hand corner.

Bill Spight · Post by **Bill Spight** » Fri Jul 28, 2017 6:20 pm

Anyway, here is a table with players who have played more than one game, first half and last half rates, and whether their win rate improved (+), declined (-), or stayed the same (=) between halves.

Code: Select all


Player               First half  Second half   Total  Improvement?
                     Wins/Losses Wins/Losses    W/L

Onishi Ryuhei           1/10        3/7         4/17     +
Mutsuura Yuta           1/9         2/8         3/17     +
Shibano Toramaru        1/5         1/4         2/9      =
Hirose Yuichi           1/3         1/3         2/6      =
Ichiriki Ryo            0/4         1/2         1/6      +
Seki Kotaro             0/4         0/3         0/7      =
Fujisawa Rina           0/3         1/2         1/5      +
Nishi Takenobu          1/2         0/2         1/4      =
Ida Atsushi             2/0         1/1         3/1      -
Inaba Karin             0/2         0/2         0/4      =
Iyama Yuta              0/2         0/1         0/3      =
Nishioka Masao          0/2         0/1         0/3      =
Nyu Eiko                0/2         0/1         0/3      =
Otake Yu                1/0         0/1         1/1      -
Yu Zhengqi              0/1         1/0         1/1      +
Fujita Akihiko          0/1         0/1         0/2      =
Yahata Naoki            0/1         0/1         0/2      =

Total                   8/51       11/40       19/91	

Improved  players:  5     
Declined  players:  2
No change players: 10

Note: Nishi Takenobu was included in the no change group because his only win was the middle game.

No convincing indication of learning yet.

MikeKyle · Post by **MikeKyle** » Mon Jul 31, 2017 6:32 am

Sorry to prolong the digression

I'm afraid these numbers are too low to draw conclusions.

Agreed.
In my line of work we'd say

"
p1 = prob win in first half
p2 = prob win in second half
H0:There is no difference, ie p1=p2
H1:There is an increase, ie p2>p1
binomial trials etc etc..
"

Treating the data handily tabulated by Bill Spight as binomials trials like this gives a p-value of about 0.247. Too high a probability for any sensible test.

ie. the occurrence of wins did increase in the data, but (partly because the movement is small), we don't have enough evidence to say that the true underlying rate at which humans win has really changed.

Digressions aside I'd be interested to know if the humans reported that their strategy had changed throughout the games.

Uberdude · Post by **Uberdude** » Sat Oct 21, 2017 8:11 am

Doesn't look like the progress, if there ever was any, was sustained: Zen apparently won the last 100 games in a row (quite possibly a stronger version). I've not seen any human wins on Wbaduk for a while.

Bill Spight · Post by **Bill Spight** » Sat Oct 21, 2017 9:06 am

Uberdude wrote:Doesn't look like the progress, if there ever was any, was sustained: Zen apparently won the last 100 games in a row (quite possibly a stronger version). I've not seen any human wins on Wbaduk for a while.

Emphasis mine.

Sorry, folks, but let me reiterate my point that the most effective learning tasks are, in general, those with a success rate of around 50%. IMO the pros would do better to take handicaps from Zen. Or large komi.

Life In 19x19

Pros making progress against AI

Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI

Re: Pros making progress against AI