Uberdude wrote:
I have a few questions for the chess experts here:
- looking at the chess openings is AlphaZero playing the long standard opening book lines or has it found a way to diverge early without playing bad moves? My impression of chess was human knowledge of the opening was closer to perfect play than in go and is sharper so there was less scope for novelty into unexplored areas without playing suboptimal moves.
I've just browsed through the games (
https://lichess.org/study/EOddRjJ8) and I have to say that I'm quite impressed. It actually hasn't really innovated in the opening, but what can be clearly seen is that it has no qualms about positional sacrifices, even large ones. This is something that the alphabeta bean counters do not tend to do just like that - the reason being that piece values are fix and positional bonuses and maluses rarely add up enough to compensate for a piece - a pawn or an exchange (i.e. of rook against bishop or knight) sometimes maybe, but a full piece? Doesn't happen.
Alpha apparently likes to play a gambit in the Queen's Indian Defense that has been around for a long time already, and which has become very popular in the wake of early games of Garry Kasparov, who used this as his main weapon in the early 80s before he became world champion.
So what does Alpha make of it? We'll have a look at game 10 (see above link).
White's gambit move is the 7th, pushing the pawn to d5. Alpha's first major deviation from established theory is the 12th move, with the 14th being the first completely new move. So what happened then? Only five moves later Alpha sacrifices its knight on h6 for no immediate material compensation - it's just that Black's rook and knight are still at home, the Black king looks vulnerable on h6 and every White piece is going to be efficiently developed about two or three moves later.
I will have to check what the latest Komodo or Houdini think about this sacrifice, but I'm confident that they are not going to like it much. Moreover, in the moves following the sacrifice, White just continues to develop calmly. It would take a lot of confidence and positional judgement for a human grandmaster to play like this, but it is conceivable. Human-style play for sure. Long lasting positional compensation isn't something the bean counters like very much, though.
A similar positional piece sacrifice can be seen in game 9, White's 30th move. The point is White's very nice follow up on the 32nd move, after which White ends up a piece down but Black is tied up nicely and its extra piece, the bishop at b7, doesn't do any relevant work. Alpha converts its advantage without any further fireworks 20 moves later. This sacrifice is probably also quite a leap for a traditional engine (alphabeta + hand crafted evaluation function).
Uberdude wrote:
- Is the play of stockfish near its peak strength, i.e if it has more time or resources does it get significantly better (anyone try at home) and not play the moves that let AZ beat it? I wonder if perhaps neural networks bots are better at blitz than tree search bots (in training you essentially transfer the skill from tree search into one huge function which is quick to compute).*
Edit: Now I read the paper the 100 game match was not 1 seconds a move like for the Elo evaluation in the graph, but 1 minute a move with 64 threads and 1 GB hash which sounds better but still I'm not clear how far from peak strength and diminishing returns that is (and could be a lot smaller than the 4 TPUs AZ got). Looking at the kibitz on the TCEC match many chess players are dismissive of the conditions, saying the specs for stockfish engine are unfair/small.
I've been out of the trade for a while but I'd say that the specs do not seem too shabby. Also, 70000k evals per second vs. 80k - well, how much additional hardware do they actually suggest to throw at Alpha?

Of course, the reason is that multithreaded alphabeta search does not scale up so well and the search has to be deep enough to compensate for the relative dumbness of the evaluation function. So they would probably have rather had something like tournament time controls (2h for 40 moves and so on) instead of a fixed time per move - then Stockfish could have used its time management (make forced moves immediately, use saved up time when problems appear (a sudden "fail-low" because a deep resource was discovered by the search - in these cases engines often use additional time to search deeper and fix the variation).