Pages

Friday, April 8, 2016

Part 4: Artificial Intelligence versus Lee Sedol: Expectations and Predictions (The historic match of deep learning AlphaGo vs. Lee Sedol)

AI vs. Human: Expectations and Predictions of the Match

Among the strongest Go players of the world there are almost none who really believe that deep learning AlphaGo will be able to defeat Lee Sedol in the Google DeepMind challenging match. However, many of them are eager to play against AlphaGo themselves. They expect this based on the games of AlphaGo against Fan Hui, the apparent mistakes AlphaGo made from time to time, and the fact that Fan Hui made a number of mistakes that Lee Sedol is very unlikely to make. A few examples of what top 9p players have been saying in advance about the outcome of the match: 

Changho Lee (9p): “I heard about the match between Lee Sedol and AlphaGo. I am surprised that an AI program could challenge a human pro on even game. I believe DeepMind challenged Lee Sedol because they thought AlphaGo has chance of winning. It will be an interesting match, but I think Lee Sedol will win this time”. 

Ke Jie( 9p): “I used to think AI can never beat human, at least it won't happen within 10 years. But this unbelievable .. I think Lee Sedol will win the match in March”.   

Dongyoon Kang (9p): “I went through the game record of the match between AlphaGo and Fan Hui, and AlphaGo plays really well. It makes huge mistakes where standard procedures are necessary. I do not understand how computers can make such mistakes.  I think AlphaGo will ultimately lose after winning and losing some of the five games. However, people say that Lee Sedol earned USD 1 million price money for free, but I do not agree. I would have feared the result”.

Gu Li (9p): “Without any doubt, this has been an astonishing development, I believe it will defeat human in the future”. 

Shi Yue (9p): “It will probably be a good opportunity for us when the program reaches the level of a top player and is accessible by the general public. I would definitely play with it every day using different tactics in order to understand more about Go”. 

Changhyuk Yoo (9p, head coach of Korean Baduk Team): “If AlphaGo's current level is similar to the one it showed during the match with Fan Hui, Lee Sedol will easily beat AlphaGo. However, we are not sure how much progress AlphaGo has made during the six months after the Fan Hui match. I originally expected that it will take long to see AI catch up with humans in Go, but I was surprised to see AlphaGo win the match against Fan Hui”. 


The main and big unknown in all expectations worldwide about the outcome of the match is the relative strength of the current version of AlphaGo compared to the version that defeated Fan Hui. Was DeepMind able to further improve AlphaGo's way of playing structurally in the mean time to approach the caliber of a 9p? If not, predicting the outcome of the match seems to be no brainer.  

Based on the match with Fan Hui, several weaknesses in the way AlphaGo's plays have been suggested (e.g. Younggil An, 8p): a lack of understanding the concept of sente, no insight in the principle of aji, misjudging   complicated moves with delayed consequences further on in the game, problems with complicated and big ko's, and an clear absence of 'creativity' by just following common patterns over and over again: AlphaGo mimics the play of professionals and then follows usually standard patterns that may not turn out to be optimal in specific positions that demand for precise, proficient, and perceptive deviations.  


Myungwan Kim (9p) also commented about AlphaGo's apparent lack of whole-board awareness: "while professional players are more creative and will vary their play more based on subtle differences in other part of the board, AlphaGo makes '5-dan mistakes' while struggling with the whole-board interconnectedness". This may be due to specific structure of the underlying models of the program: the convolutional neural networks AlphaGo are based on, are typically local in nature, and therefore don’t build a coherent whole-board understanding. Therefore, we don’t know yet how AlphaGo will do when the fighting gets more extended and complex, or when the board is more fluid and multiple local positions are not unresolved yet.

According to AI games programmer RĂ©mi Coulom AlphaGo cannot propagate information at a distance more than 13 points due to it's underlying neural network architecture (1 layer of 5x5, 11 layers of 3x3 convolution). So when there is a fight on one side of the board, AlphaGo is unlikely to assess and interpret correctly local positions on the other side of the board. Therefore, the program might getting into serious trouble in positions with important, non-local fields of tension (for instance, during multple fights occuring simultaneously all over the board.  
DeepMind research scientist Thore Graepel explains: “Although we have programmed this machine to play, we have no idea what moves it will come up with.It moves are an emergent phenomenon from the training. We just create the data sets and the training algorithms. But the moves AlphaGo then comes up with are out of our hands – and much better than we, as Go players, could come up with. The program's rather autonomous nature”.
 

DeepMind chief executive Demis Hassabis adds: “AlphaGo played itself, different versions of itself, millions and millions of times and each time got incrementally slightly better as it learned from its mistakes”.  About two weeks before the match, Aja Huang (6d) commented: “We are still preparing hard for the match, AlphaGo is getting stronger and stronger”. Learning and improving from its own matchplay experience means deep learning AlphaGo is now even stronger than when it beat European champion Fan Hui last year. Asked whether there is a ceiling to AlphaGo's learning abilities, Hassabis answered: “If there is one, we haven’t found it yet”.

Hence an important question is in what areas AlphaGo has been improved further by the DeepMind team? What smart and effective upgrades of AlphaGo over the past five months can we expect? Thereby, we roughly can make four classes of improvements (see also: 'Let it Go' event,  Leo Dorst, UvA): data, algorithms and software, training and learning, hardware and computing time:

Data

-selection of stronger pro games (not only ≥ 6d from KGS, also collections of prof games). It has been shown that with small improvements in the accuracy of reproducing prof moves, immediately big leaps forward can be made in playing strength.
-extension of AlphaGo with comprehensive joseki-, shape- and/or complex pattern libraries
-comprehensive analysis and training on Lee Sedol games specifically, perhaps targeting weaker elements in the way Lee Sedol plays (if these exist at all)

Algorithms and Software

-AlphaGo has learned from mistakes it made during the match against Fan Hui
-improvement of AlphaGo's algorithms for e.g.
move selection and board evaluation
-prevent / circumvent specific problematic  situations (for instance: complex ko situations)
-improve and/or extend feature filters in order to be able to represent positions better and in more detail, these determine whether a (subpart of a) position during a game against Lee Sedol is sufficiently precise being recognized and accurately classified by AlphaGo
-improvement of the balance between on one side the neural networks for move selection and board evaluation and on the other side precise computation using Monte Carlo Tree Search
-extension of the number of network layers in order to recognize new, more specific go-features 
-incorporate new ideas and concepts to increase the performance of AlphaGo's play

Training and Learning

-fine tuning and extension of AlphaGo's neural network training sessions
-extension of the number of studied go-positions (>60 million) and/or games played (e.g. against itself, ≥ 1.3 million) to increase the accuracy of selecting and playing winning moves by AlphaGo
-improvements in learning the value of go-moves, for instance by more detailed and more accurate backpropagation of the final game result during the training sessions

Hardware and Computing Time:

-extension of the number of conventional (> 1202 CPUs) and graphical processors (>176 GPUs) the distributed version of AlphaGo can use simultaneously
-increasing the thinking time / computation time)
(this used to be 1 hour per person during the Fan Hui match, and apparently will be now 2 hour, which will be particularly beneficial for AlphaGo, especially towards the endgame


Which of these improvements will be applied and whether they will compensate for the weaknesses in play of the earlier version of AlphaGo? It is hard to tell what another five months 24/7 complimentary neural network training can do for AlphaGo's way of playing. According to William Sanzenin (quora.com): “I’d expect the March version of AlphaGo to be significantly stronger than the one that played last October. Its reading is going to be even better. Its assessment of the global position will be improved”. 

In case the professionals’ observations about AlphaGo’s play do reflect limitations inherent to its system structure and / or approach, as a matter of course,  this is fatal  against Lee Sedol. Also, it is possible that the program simply hasn't studied enough top-prof game positions. Herewith, the generally accepted idea is that deep learning models are as good as the data you feed them (see also: AlphaGo under a Magnifying Glass). So pro games as learning material undoubtedly will increase substantially AlphaGo's playing strength.

Concerning the important question what will be the hardware used for the distributed version of AlphaGo during the match against Lee Sedol, Demis Hassabis tweeted: “We are using roughly same amount of compute power as in Fan Hui match: distributing search over further machines has diminishing returns”. That means AlphaGo will be using 1,202 CPUs and 176 GPUs. However, various sources, including the New Economist, state that “the actual version of AlphaGo will be using 1,920 CPUs and 280 GPUs which is broadly similar to the computer power used in the match with Fan Hui”.

Apart from the extended training of AlphaGo by tera playing against itself, the possibly extended hardware, and the increased thinking time, nobody actually knows precisely what detailed improvements in AlphaGo's play have been achieved  over the past five months.

Another issue is whether AlphaGo can adapt in real-time (during and / or after a game) to the way Lee Sedol plays. Hassabis: “This is about teaching and learning. One game is not enough data to learn from –for a machine-- and training takes an awful lot of time: training a new version of AlphaGo takes about 4 – 6 weeks”.

On March 8th, the evening before the match, on nearly all polls worldwide show similar forecasts: about 75 - 85 % of all voters is convinced Lee Sedol will win the match against AlphaGo. This is also in line with the predictions, in advance of the match, of a price contest among dutch Go players concerning the expected outcome (106 participants, including the strongest amateur players in Europe, organised in cooperation with chess and go shop 'het Paard ' and an ICT company). 


To summarize: there are many reasons to believe that the actual version of AlphaGo has climbed at least a few dan grades: the expected playing strength of AlphaGo is ≥ 8p with a base of expected improvements and the fact that AlphaGo's playing strength already was ~5p at time of the Fan Hui match.

In advance of the match, Lee Sedol is very self-confident about winning the match against AlphaGo. When invited for the match Lee Sedol reacted: “This is the first time a computer has challenged a human pro to an even game and I am privileged to be the one to play it. Regardless of the result, it will be a meaningful event in baduk history. I heard Google DeepMind's AI is surprisingly strong and getting stronger, but I am confident that I can win, at least this time”. 

In an interview in Yonhap News, Lee Sedol told he is confident of beating AlphaGo by a score of 5-0, or at least with 4-1, and that he actually accepted the challenge after only 5 minutes of thinking. Also, he stated:  "Of course, there would have been many updates of AlphaGo in the last four or five months, but that isn’t enough time to challenge me". And a couple of weeks before the match, in an inverview with Sohn Suk-hee, Lee Sedol stated that: "even if I beat AlphaGo by 4-1,
the Google DeepMind team has the right to claim its --de facto-- victory and to celebrate my defeat, or even that of mankind".




1 comment:

  1. AI vs. Human: Expectations and Predictions of the Match of the 21st century

    Among the strongest Go players of the world there were almost none who really believed that deep learning AlphaGo would be able to defeat Lee Sedol in the Google DeepMind challenging match. However, many of them were eager to play against AlphaGo themselves. They expected the outcome from the games of AlphaGo played against Fan Hui, the apparent mistakes AlphaGo made at the time, and the fact that Fan Hui made a number of mistakes that Lee Sedol would be very unlikely to make.

    ReplyDelete