*Written by Joseph Pious
We all play games against computers, and depending on the difficulty, we win or lose. Games have long served as a valuable research domain for artificial intelligence, providing structured representations and clear reward mechanisms, especially in board games. Initially, the focus was on replicating human competence in complex games, with successes achieved in popular tabletop games like Go, Backgammon, and Chess. However, the exploration of electronic games introduced new challenges and shifted the research focus. The degree of fairness and competitiveness between humans and AI in these games determines the extent to which AI has reached human-level intelligence.
Since the inception of video games, players have been testing their skills against computer opponents. Difficulty levels were typically pre-programmed, leaving only the most skilled players capable of defeating the system. However, in 2000, Perfect Dark for the Nintendo 64 introduced the concept of Simulant AI, allowing for customizable and exclusive matches with varying difficulty levels. (1) Fast forwarding to the present age, where online gaming has gained popularity, particularly in role-playing and battle royale games there has been a growing separation between human players. This is leading to a new era of social interaction where human vs. machine and eventually machine vs. machine competitions emerge.
Classic board games like chess, Go, and Backgammon share characteristics that make them ideal for developing computer game players. Self-playing games offer agents an adaptive challenge by facing an improving opponent. Moreover, games have predetermined outcomes (winning, tie, or losing), predictable behavior, and no hidden features. This simplifies the learning process compared to real-world scenarios with uncertainty, probabilistic cycles, and ambiguous outcomes. However, it is important to ask the question as to whether the competitions between AI and Humans are fair and equal?
Understanding AI Benchmarks in Gaming
The key factor that sets apart current advancements in computer gaming is the extensive utilization of deep neural networks. Put simply, deep neural networks have the ability to closely approximate any continuous function. Moreover, with sufficient training data, these networks can effectively generalize their learning to new information. Therefore, we can view neural networks as powerful tools for approximating functions in high-dimensional spaces within the context of reinforcement learning.
TD-Gammon is a Backgammon-playing program developed at IBM using temporal-difference learning, which trains a neural network through self-play to minimize the difference between its predicted outcomes and actual outcomes across sequential game states. (2) Between 1991 and 1992, TD-Gammon competed against top players and performed on par with the former world champion.
Deep Blue, another IBM program, was designed for playing chess. It utilized advanced hardware and software, including tree search enhanced by pruning and assessment algorithms. Deep Blue gained widespread attention in 1997 when it defeated reigning champion Garry Kasparov. The success of Deep Blue raised debates about the fairness of the victory, considering factors like human emotions, tiredness, preparation time, and the difference between human reasoning and brute-force calculations. (3)
In 2016, AlphaGo, created by Google DeepMind researchers, became the first AI agent to defeat a human Go champion. Go posed significant challenges due to its huge search space and the difficulty of accurately predicting winners. AlphaGo combined Monte Carlo Tree Search with convolutional neural networks trained from human games and self-play. Subsequent versions, such as AlphaGo Zero, trained entirely through self-play and surpassed earlier iterations.
AlphaGo employs multiple neural networks and deep tree search, approximating the policy function, mapping board status into actions, and predicting game outcomes for each state. The policy network exists in three forms: a supervised learning policy network imitating experienced human players, a reinforcement learning policy network initially cloned from the supervised network, and a quick rollout policy network for expanding tree search. The value network estimates the probability of victory for each game state. (4)
AlphaGo uses the policy network to look ahead one or two moves, while the value network measures the value of forward-looking branches. A lightweight rollout policy is paired with the value network to fill out the search tree. While backward induction is commonly used for game analysis, the vast number of potential moves in Go makes complete tree search impossible. AlphaGo integrates a Monte Carlo tree search into its deep convolutional neural network, considering only a fraction of possible move sequences and assigning scores based on hypothetical outcomes, deviating from Nash equilibrium theory. (5)
Compared to traditional board games, electronic games present additional challenges for AI researchers due to continuous time scales and vast game state and action spaces, making brute-force search methods less effective than in Go or Chess. The interest in using video games as AI benchmarks grew significantly after AlphaGo's groundbreaking performance in 2016, which defeated the highly challenging game of Go, prompting the exploration of even more difficult challenges.
DeepMind introduced AlphaStar, an AI agent for Starcraft 2, which initially triumphed over elite human players TLO and MaNa in private matches but eventually lost to MaNa in a live show match. Interestingly, the developers made modifications to the agent between the private matches and the live game. Originally, the agent had the ability to view the entire map at once (excluding areas hidden by the "fog of war"), but later it was limited to monitoring regions by rotating the camera, emulating human players. This adjustment aimed to address potential unfairness concerns by simulating the input and output experiences encountered by human players.
Recent research has also focused on programs for Dota 2 gaming, such as OpenAI, which achieved significant advancements. However, there have been debates regarding the visualization and communication methods employed by the OpenAI agent. The agent's interface includes high-level features that allow it to "see" detailed information about units, such as their remaining health and attack value, at any given moment. In contrast, a human player would need to click on each unit individually to obtain such details. The agent can also designate high-level behaviours by choosing powers, targets, offsets, and even time delays. Achieving the same results as a human would require a combination of key presses and imprecise mouse movements. These interface advantages demonstrated moments in the games where, despite the agents' theoretically equal response time of 200ms compared to humans, they performed critical actions such as interrupting spells or coordinating powerful abilities in seemingly impossible ways for humans. (6)
Fairness Dimensions in AI v Human Gaming
Fairness in AI-human gaming encompasses various dimensions that warrant analysis. While fairness is difficult to define comprehensively, a standard perspective in sports and gaming suggests that a match is unequal if one side is significantly favored by conditions that do not impact the game itself. Initially, achieving equitable conditions between a computer program and a human seems more attainable in tabletop games like Backgammon, Chess, and Go than in electronic games. Notably, input and output justices play a lesser role in tabletop games.
Regarding fairness dimensions, input fairness examines whether the AI and human share the same input area, such as the pixels on a screen. In video gaming, this becomes particularly relevant as humans can only see limited information on the screen and must take additional actions like scrolling or clicking units to gather more information. In contrast, an algorithm operating on pixels can receive a standardized list providing the position and status of game objects, akin to the input a human retina receives but with blind spots and lower peripheral resolution. In electronic games, a typical recommendation to increase parity is for both AI and humans to play the game from the same pixels rather than relying on higher-level game features.
Output fairness considers whether the systems share the same output space or means of communicating with the environment and other participants, as well as whether they have the same reaction time. Excessively quick response time is often cited as a reason for an agent's perceived artificial behavior. The representation of the action space can take various forms, including high-level representations like OpenAI's checklist approach ([ability, aim, offset]), user interface command simulations like screen movement and unit highlighting in Starcraft, or direct simulations of a virtual controller as in ALE. (7)
Experience fairness examines whether both humans and computers spend an equal amount of time playing the game, while knowledge fairness evaluates whether agents have access to the same descriptive understanding of the game compiled by others.
Compute fairness pertains to the depth of game state exploration during a tree search. The use of a forward model to predict potential game states, as seen in Deep Blue's tree search and AlphaGo's rollouts, may be viewed as violating compute fairness. This is akin to providing a person with additional boards and pieces to replicate possible lines of play during a match, which is often prohibited. Taking this idea further, one might imagine a scenario where a fully automated game-playing machine is perceived as a simple enhancement of a human's cognitive ability, resulting in a "human versus AI" match where both moves are chosen by the same algorithm, one playing for itself and the other on behalf of the human. (8)
Psychological fairness considers whether agents exhibit a similar range of emotional states as humans, including anxiety, joy, and mental exhaustion. In tabletop games, fairness between human and artificial players is influenced by emotions such as exhaustion, panic, and anxiety. Incorporating these feelings into a computer simulation is challenging, and attempts to do so, such as artificially inducing noise in the algorithm's estimation during stressful situations, may undermine the goal of developing the best possible game-playing systems.
To conclude, electronic games pose unique challenges for AI researchers, and fairness dimensions encompass input, output, experience, knowledge, compute, and psychological aspects. Notably, distinctions exist between electronic and traditional board games regarding the significance of input and output justices.
The vast number of games and diverse architectures of game-playing agents make it impossible to determine human-level intelligence based on AI performance in any single game. A completely fair competition can only be achieved when an artificial system is practically indistinguishable from a human being. It is important to note that the absence of a fair analogy between AI and humans does not render achievements in game-playing AI meaningless. These accomplishments can still be valuable for advancing research and addressing complex real-world challenges that lie ahead.
*The author is a lawyer from India.
(The image used here is for representative purposes only)
1. Shelly Palmer, AI and Esports, Shelly Palmer (Sept 23, 2018), https://www.shellypalmer.com/2018/09/ai-and-esports/
2. Gerald Tesauro, 1995, Temporal difference learning and TD-Gammon, Commun. ACM 38, 3 (1995), 58–68,https://www.csd.uwo.ca/~xling/cs346a/extra/tdgammon.pdf
3. Murray Campbell, A Joseph Hoane Jr, and Feng-hsiung Hsu, 2002, Deep blue, Artificial intelligence 134, 1-2 (2002), 57–83. https://pdf.sciencedirectassets.com/271585/1-s2.0-S0004370200X00847/1-s2.0-S0004370201001291/main.pdf
4. David Silver, et al, 2017, Mastering the game of Go without human knowledge, Nature 550, 7676 (2017), 354. https://www.nature.com/articles/nature24270
5. Gabe stechshulte, Game Theory Concepts Within AlphaGo, towards data science (May 9, 2020), https://towardsdatascience.com/game-theory-concepts-within-alphago-2443bbca36e0
Nash equilibrium describes a situation in which each player has chosen a strategy and no player can benefit by changing strategies while the other players keep theirs unchanged.
6. OpenAI, OpenAI Blog (June 25, 2018), https: //blog.openai.com/openai-five/
7. Rodrigo Canaan, et al, Leveling the Playing Field Fairness in AI Versus Human Game Benchmarks, The Fourteenth International Conference on the Foundations of Digital Games (FDG ’19), August 26–30, 2019, https://arxiv.org/pdf/1903.07008.pdf
8. Andy Clark and David Chalmers, 1998, The extended mind analysis 58, 1 (1998), 7–19. https://www.jstor.org/stable/3328150