djhbrown wrote:

either way, it would be interesting to see just how much consensus there is between different bots of similar strength - but it can't be expected to eliminate blind spots.

I am not so sure about that. It might depend VERY much on why they did or did not decide on different moves. And perhaps the "jury" is being called in at the wrong time.

Suppose there is engine A which operates by coming up with a set of candidate moves and then applies MCTS analysis to select the best from that set. Suppose there is engine B which does the same thing. Since different methods used (to create the set of candidate moves) these sets might be different.

Now suppose the process proceeded the way first described (each bot proceeds to analyze ITS candidate set using MCTS) but because the sets were different, could come up with different choices for best. Now that looks to me like a "jury" tied 1:1. If one of the "best"s is better than the other we have a 50% chance of selecting it.

But suppose we brought the "jury" in earlier, before applying MCTS. Instead, we form the union of those two candidate sets and apply MCTS to that. We do end up picking the best move.

If we assume that whenever the two bots don't have the same candidate set it is 50% for either of them to have the better move contained in the difference (see note below) with the original proposal we would have a 50% chance of ending up with that from a fair jury. But with the "jury" brought in earlier, would find that move. To me that seems to be finding/eliminating SOME "blind spots".

NOTE: If when there is a difference in candidate sets one of these engines always contains the better move, simply use that engine. The other would add NOTHING to the process.