Class GameStateValueBehaviour

All Implemented Interfaces:
Serializable, Cloneable, Behaviour

public class GameStateValueBehaviour extends IntelligentBehaviour
GameStateValueBehaviour is an implementation of a decent AI with the best-in-class performance among bots in the community.

The objective of the bot is to determine the best possible action from the list of actions it can choose from, given the current state of the game. In the AI community, this is called a "policy," and it is typically represented in the literature using the Greek letter Pi.

Policies take as inputs a list of actions, like "Friendly hero attacks opposing hero," and the current state of the game, in this case all the data available in GameContext, and returns an action as an output. requestAction(GameContext, Player, List) corresponds to this AI's policy; when implementing an idea, especially from the literature, start with requestAction(GameContext, Player, List).

In this AI, requestAction(GameContext, Player, List) tries each action available to take and scoring the outcome. The score, in this case, depends on the game state. The action that leads to the highest score is the action returned by requestAction(GameContext, Player, List). Since card games typically involve combos (sequences of actions), this AI will actually play out all available actions until the end of its turn, seeking which starting action will maximize its score at the **end** of its turn.

Therefore, the best way to describe this AI is: It is a "single turn horizon" AI. That is, it tries to pick actions to maximize a score by the end of the turn.

Playing around secrets is difficult without a long-term vision of the game, so enemy secrets are omitted from the simulation entirely. The bot's and opponennt's start turn effects are heuristically triggered at the end of the turn.

How does the AI do scoring? Clearly, it can't be as simple as, "The highest score is whatever reduces the opponent's health the most." Indeed, this class uses a complex model for a score, called a Heuristic, which is capable of looking at any factor in the game state to contribute to the score. The specific heuristic used by this class is the ThreatBasedHeuristic. Visiting that class, it's clear that things like holding onto hard-removal cards, or fully destroying minions, contribute greatly to the score, along with the heroes' and minions' attack and hitpoints.

In order to understand the various tradeoffs between actions, the way the Heuristic is calculated should somehow reflect the actual ability of these actions to lead to victory. Clearly, the score for an action should, in an ideal world, be as simple as, "Whatever maximizes my chance of winning." But for the time being, answering that question is computationally impossible in this card game. Instead, GameStateValueBehaviour makes the assumption that across many games, maximizing some score at the end of my turn maximizes my chance of winning on average. Hearthstone generally rewards great tactical play, so this is a surprisingly robust assumption. But it isn't necessarily true for many games or for all cards in this game. However, this assumption makes this AI fast, so it is preferred in this context.

The ThreatBasedHeuristic tries to maximize the chance of winning by somehow relating its scoring mechanism to the actual outcome of a match. The Cuckoo application in the cluster package is the system that tweaks the scoring function in order to choose tweaks that corresponded to greater wins in the game. This approach makes GameStateValueBehaviour the best delivered AI in the Hearthstone community.

See Also:
  • Field Details

    • DEFAULT_TARGET_CONTEXT_STACK_SIZE

      public static final int DEFAULT_TARGET_CONTEXT_STACK_SIZE
      See Also:
    • DEFAULT_MAXIMUM_DEPTH

      public static final int DEFAULT_MAXIMUM_DEPTH
      See Also:
    • DEFAULT_TIMEOUT

      public static final int DEFAULT_TIMEOUT
      See Also:
    • DEFAULT_LETHAL_TIMEOUT

      public static final int DEFAULT_LETHAL_TIMEOUT
      See Also:
    • heuristic

      protected Heuristic heuristic
    • featureVector

      protected FeatureVector featureVector
    • nameSuffix

      protected String nameSuffix
    • timeout

      protected long timeout
    • strictPlan

      protected Deque<GameAction> strictPlan
    • indexPlan

      protected Deque<Integer> indexPlan
    • maxDepth

      protected int maxDepth
    • minFreeMemory

      protected long minFreeMemory
    • disposeNodes

      protected boolean disposeNodes
    • parallel

      protected boolean parallel
    • forceGarbageCollection

      protected boolean forceGarbageCollection
    • throwOnInvalidPlan

      protected boolean throwOnInvalidPlan
    • pruneContextStack

      protected boolean pruneContextStack
    • throwsExceptions

      protected boolean throwsExceptions
    • expandDepthForLethal

      protected boolean expandDepthForLethal
    • triggerStartTurns

      protected boolean triggerStartTurns
    • pruneEarlyEndTurn

      protected boolean pruneEarlyEndTurn
    • lethalTimeout

      protected long lethalTimeout
    • targetContextStackSize

      protected int targetContextStackSize
    • requestActionStartTime

      protected long requestActionStartTime
  • Constructor Details

    • GameStateValueBehaviour

      public GameStateValueBehaviour()
    • GameStateValueBehaviour

      public GameStateValueBehaviour(FeatureVector featureVector, String nameSuffix)
  • Method Details

    • getClone

      protected GameContext getClone(GameContext original)
      Returns a clone of the game context, assuming the opponent is a GameStateValueBehaviour too.
      Parameters:
      original - The original game context to use.
      Returns:
      The clone.
    • clone

      public GameStateValueBehaviour clone()
      Description copied from interface: Behaviour
      Clones the behaviour, typically with its internal state.
      Specified by:
      clone in interface Behaviour
      Overrides:
      clone in class AbstractBehaviour
      Returns:
      A clone of the Behaviour instance.
    • getMaxDepth

      public int getMaxDepth()
      Indicates the maximum depth of breadth-first-searched nodes that should be expanded in order to find the highest scoring game state.

      Setting this depth higher exponentially increases the number of nodes that could get visited for evaluating potential game state scores.

      Setting this depth too low will make the bot miss lethal, especially if it has to use more than maxDepth cards or attack with more than maxDepth minions in order to kill the bot's opponent.

      The default value on the hosted version of Spellsource is 2. For a good compromise between performance and finding the most commmon lethals, choose 5.

      Returns:
      The currently configured maximium depth.
    • setMaxDepth

      public GameStateValueBehaviour setMaxDepth(int maxDepth)
    • throwsExceptions

      public boolean throwsExceptions()
      Indicates this game state value behaviour should throw exceptions when its underlying assumptions about the mechanics of the game are violated. For example, this will cause the GSVB to throw an exception if it is requested to evaluate discover actions directly.
      Returns:
      true if operating in debug mode.
    • setThrowsExceptions

      public GameStateValueBehaviour setThrowsExceptions(boolean throwsExceptions)
    • getName

      public String getName()
      Description copied from interface: Behaviour
      Gets a name for the behaviour. This should correspond to how the decisions are being made, e.g., a "Human Behaviour" or an "AI Behaviour.
      Returns:
      A String description of the behaviour.
    • getStrictPlan

      public Deque<GameAction> getStrictPlan()
      A strict plan is a cache of a computed path (sequence of actions) to a gamestate stored as the actions themselves.

      Whenever you call requestAction(GameContext, Player, List), the instance of GameStateValueBehaviour evaluates sequences of actions of length maximum getMaxDepth(), and scores the value of the last game state (i.e. the game state you arrive at after performing that sequence of actions). But the requestAction(GameContext, Player, List) method returns the first action in that sequence.

      Clearly, the sequence of best actions isn't going to change before and after you take the GameAction that was returned by the first call to requestAction(GameContext, Player, List). This Deque stores the sequence that was computed as a side effect of requestAction(GameContext, Player, List). With it, the next call to requestAction(GameContext, Player, List) doesn't have to recompute a whole sequence of actions every time; it can use whatever is left of the sequence of actions that led to the best scoring state.

      Since game states are reproducible, and this behaviour "cheats" (it knows what the random seed is), there should be an exact match between the Deque.peekFirst()'d GameAction in this plan and a game action returned by GameContext.getValidActions() until the plan has been exhausted (i.e. the plan is Collection.isEmpty() == true).

      Because the API of a GameContext does not guarantee that a GameAction has no references to the GameContext or its objects, this class also implements a getIndexPlan(), which uses integers to represent an index into GameContext.getValidActions().

      For example, this code will "follow the plan" that was computed as a side effect of running requestAction(GameContext, Player, List).

       
       while (!getStrictPlan().isEmpty()) {
         context.performGameAction(playerId, getStrictPlan().pollFirst());
       }
       
       
      Returns:
      A path of actions (state transitions) towards the highest scoring game state.
      See Also:
    • setStrictPlan

      public void setStrictPlan(Deque<GameAction> strictPlan)
    • mulligan

      public List<Card> mulligan(GameContext context, Player player, List<Card> cards)
      Mulligans for cards, preferring to create an on-curve starting hand.
      Specified by:
      mulligan in interface Behaviour
      Overrides:
      mulligan in class IntelligentBehaviour
      Parameters:
      context - The game context.
      player - The player who's mulliganing.
      cards - The cards in the player's first hand.
      Returns:
      A list of cards to discard.
    • requestAction

      @Nullable public @Nullable GameAction requestAction(@NotNull @NotNull GameContext context, @NotNull @NotNull Player player, @NotNull @NotNull List<GameAction> validActions)
      Requests an action from the GameStateValueBehaviour using a scoring function. This method uses a cache of what it has computed before if it is provided with setIndexPlan(Deque) or setStrictPlan(Deque).

      Suppose the board looked like this:

           Opponent: Warrior, 7 HP, no cards in hand, no minions on the board.
           This Player: Mage, 30 HP, a Fireball in the hand, 6 mana.
       

      Clearly, this player has lethal: Fireballing the opponent, followed by Fireblasting the opponent, will win the game. There are two sequences of actions in this case that win the game. How does this function wind up returning the correct actions twice in order to win the game?

      First, at the beginning of the turn, the request action function receives the above game state, followed by the possible actions:

       1. Fireball opponent.
       2. Fireball yourself.
       3. Fireblast opponent.
       4. Fireblast yourself.
       5. End the turn.
       

      Suppose we scored each action with a simple function: "1 point if the enemy hero is destroyed, otherwise 0."

       1. Fireball opponent = 0 points.
       2. Fireball yourself = 0 points.
       3. Fireblast opponent = 0 points.
       4. Fireblast yourself = 0 points.
       5. End the turn = 0 points.
       

      By just looking at the current actions, it's impossible to see that fireballing or fireblasting your opponent will lead to victory, even though the scoring function ought to work fine in this particular case.

      What if instead we chose an action based on the score of the state at the end of the SEQUENCE of actions that particular action can enable? If we expand all the possible actions given our choices, we get:

       1. Fireball opponent = 0 points.
        1. Fireblast opponent = 1 point.
          **1. End turn = 1 point.**
        2. Fireblast yourself = 0 points.
          1. End Turn = 0 points.
        3. End turn = 0 points.
       2. Fireball yourself = 0 points.
        1. Fireblast opponent = 0 points.
          1. End turn = 0 points.
        2. Fireblast yourself = 0 points.
          1. End Turn = 0 points.
        3. End turn = 0 points.
       3. Fireblast opponent = 0 points.
        1. Fireball opponent = 1 point.
          **1. End turn = 1 point.**
        2. Fireball yourself = 0 points.
          1. End Turn = 0 points.
        3. End turn = 0 points.
       4. Fireblast yourself = 0 points.
        1. Fireball opponent = 0 points.
          1. End turn = 0 points.
        2. Fireball yourself = 0 points.
          1. End Turn = 0 points.
        3. End turn = 0 points.
       5. End the turn = 0 points.
       

      When expanding all the possible actions, there are now two sequences of actions that end with 1 point.

      This function will return the FIRST action in the sequence that terminates with the highest score at the end of the turn. In this example, it will return either action 1 (Fireball opponent) or action 3 (Fireblast opponent).

      The scoring function is much more complicated, but in broad strokes it works the way as described above.

      Parameters:
      context - The game context where the choice is being made.
      player - The player who is making the choice.
      validActions - The valid actions the player has to choose from.
      Returns:
      The action that maximizes the score of the state of the game at the end of this player's turn.
      See Also:
    • savePlan

      @Nullable protected @Nullable GameAction savePlan(Optional<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> maxScore)
      Saves the plan and retrieves the game action from the max score
      Parameters:
      maxScore -
      Returns:
    • isInterrupted

      protected boolean isInterrupted()
      Checks if the bot has timed out or if the thread it is executing on is interrupted.
      Returns:
    • exceptionActionChoice

      protected GameAction exceptionActionChoice(Optional<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> maxScoreSoFar, @NotNull @NotNull List<GameAction> gameActions)
      Gets the best scoring action, the end turn action or the first action in the list.
      Parameters:
      maxScoreSoFar -
      gameActions -
      Returns:
    • pruneContextStack

      protected void pruneContextStack(Deque<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> contextStack, int playerId)
      Prunes the context stack to save memory. Removes terminal nodes that are not worth exploring heuristically.
      Parameters:
      contextStack -
      playerId -
    • evaluate

      protected void evaluate(Deque<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> contextStack, int playerId, net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node node, GameAction action, int depth)
      Evaluates the provided game state with the provided action, then appends a new game state with potential actions to the contextStack. This expands the game tree by one unit of depth.

      If rolling out the specified action leads to calls to GameLogic.requestAction(Player, List), like a discover or a battlecry request, this method will breadth-first-search those intermediate actions until it gets to a non-intermediate game state (i.e., one with all of the intermediate action requests answered).

      For example, consider the card: "Choose between two: 'Do nothing', and: 'Choose between do nothing and win the game.'" The action is to play this card. The following additional combinations of intermediate actions are created:

       1. Discover and cast do nothing.
       2. Discover and cast 'Choose between do nothing and win the game.'
         1. Discover and cast 'Do nothing.'
         2. Discover and cast 'Win the game.'
       
      Clearly, we want the bot to perform the following sequence of actions: Play this card, then make choices #2, #2, because that will win the game.

      In order to choose that path without emitting intermediate nodes onto the contextStack, this function queues these intermediate actions and restarts from the beginning, evaluating a particular sequence it queued. Eventually, there is a sequence of actions queued that includes "play this card, make choice #2, then make choice #2," and since that sequence terminates into a non-intermediate game state, that sequence and the resulting game state are queued as a node onto the contextStack.

      This optimization only applies to the particular architecture of Spellsource.

      Parameters:
      contextStack - The stack of contexts onto which this function should append rolled-out game states.
      playerId - The player ID of the player whose point of view we're computing this rollout.
      node - The node (i.e., game state) from which the specified action should be rolled out.
      action - The action to roll out.
      depth - The current depth of this rollout. This is the count of non-intermediate actions from the game state that requestAction(GameContext, Player, List) was called with.
    • postProcess

      protected void postProcess(int playerId, GameContext context)
      Post-processes a game state for scoring.

      Currently, this method triggers turn start effects on both sides of the battlefield.

      Parameters:
      playerId -
      context -
    • setIndexPlan

      public void setIndexPlan(Deque<Integer> indexPlan)
    • getIndexPlan

      public Deque<Integer> getIndexPlan()
      The index plan is a sequence of indices into GameContext.getValidActions() that the bot can perform to go towards a previously-computed highest-scoring game state. It is essentially a cache of a prior computation of the best possible getMaxDepth() number of actions.

      For example, this code will "follow the plan" that was computed as a side effect of running requestAction(GameContext, Player, List).

       
       while (!getIndexPlan().isEmpty()) {
         context.performGameAction(playerId, context.getValidActions().get(getIndexPlan().pollFirst());
       }
       
       
      Returns:
      Indices into GameContext.getValidActions()
      See Also:
    • getTimeout

      public long getTimeout()
      Just below the maximum amount of time in milliseconds the bot will spend per call to requestAction(GameContext, Player, List) to determine its sequence of actions. This time tends to be amortized over getMaxDepth()-length action sequences, because this behaviour caches the calculation of a complete sequence.
      Returns:
      The time to spend per request, in milliseconds
    • isDisposeNodes

      public boolean isDisposeNodes()
      Indicates whether or not nodes should be "disposed" (their game context references set to null.

      On some JVMs, this may help the garbage collector find finished game states faster.

      Returns:
    • setDisposeNodes

      public GameStateValueBehaviour setDisposeNodes(boolean disposeNodes)
    • isForceGarbageCollection

      public boolean isForceGarbageCollection()
      Indicates whether or not a garbage collection call via System.gc() should be made whenever a game context is done being processed.

      This will slow the bot down but may reduce overall heap usage.

      Returns:
    • setForceGarbageCollection

      public GameStateValueBehaviour setForceGarbageCollection(boolean forceGarbageCollection)
    • setTimeout

      public GameStateValueBehaviour setTimeout(long timeout)
    • isParallel

      public boolean isParallel()
      Indicates whether or not the bot should process / expand nodes in its game state tree expansion using multiple threads.

      When using GameContext.simulate(List, Supplier, Supplier, int, boolean, AtomicInteger), typically this should be set to false whenever useJavaParallel is true.

      Returns:
    • setParallel

      public GameStateValueBehaviour setParallel(boolean parallel)
    • isThrowOnInvalidPlan

      public boolean isThrowOnInvalidPlan()
      Throws an exception if an invalid plan is encountered.

      When errors occur during replaying existing plans, there might be a game reproducibility issue where the same exact seed and sequence of actions did not produce the same exact results; or, there is an issue cloning game states.

      Returns:
    • setThrowOnInvalidPlan

      public GameStateValueBehaviour setThrowOnInvalidPlan(boolean throwOnInvalidPlan)
    • getLethalTimeout

      public long getLethalTimeout()
      Indicates the amount of time available to this instance to find lethal, when lethal is probably on the board.
      Returns:
    • setLethalTimeout

      public GameStateValueBehaviour setLethalTimeout(long lethalTimeout)
    • isPruneContextStack

      public boolean isPruneContextStack()
      Indicates whether or not this class should make attempts to prune the "context stack," or game states left to expand, in order to save memory.

      The pruning strategies may change.

      Returns:
    • setPruneContextStack

      public GameStateValueBehaviour setPruneContextStack(boolean pruneContextStack)
    • getTargetContextStackSize

      public int getTargetContextStackSize()
      When pruning with isPruneContextStack(), sets the maximum size of the context stack (the number of game states left to expand).
      Returns:
    • setTargetContextStackSize

      public GameStateValueBehaviour setTargetContextStackSize(int targetContextStackSize)
    • getMinFreeMemory

      public long getMinFreeMemory()
      Gets the minimum observed free memory recorded during the execution of this instance.
      Returns:
    • getRequestActionStartTime

      protected long getRequestActionStartTime()
      Records a call to System.currentTimeMillis() at the start of a call to requestAction(GameContext, Player, List).
      Returns:
    • setRequestActionStartTime

      protected GameStateValueBehaviour setRequestActionStartTime(long requestActionStartTime)
    • isExpandDepthForLethal

      public boolean isExpandDepthForLethal()
      Indicates whether the getMaxDepth() setting should be temporarily expanded to the number of actors and playable cards + 1 (the end turn action) in order to help the system find a way to lethally destroy a target.
      Returns:
    • setExpandDepthForLethal

      public GameStateValueBehaviour setExpandDepthForLethal(boolean expandDepthForLethal)
    • isTriggerStartTurns

      public boolean isTriggerStartTurns()
      Indicates if start turn effects should be evaluated at the end of the bot's turn.
      Returns:
    • setTriggerStartTurns

      public GameStateValueBehaviour setTriggerStartTurns(boolean triggerStartTurns)
    • isPruneEarlyEndTurn

      public boolean isPruneEarlyEndTurn()
      Indicates if end turns should only be evaluated if they are the only action available
      Returns:
    • setPruneEarlyEndTurn

      public GameStateValueBehaviour setPruneEarlyEndTurn(boolean pruneEarlyEndTurn)
    • observesLethal

      public static boolean observesLethal(GameContext context, int playerId, Actor target)
      Determines whether a combination of physical attacks, weapons and direct damage spells can give the player lethal against its opponent.

      When this is true, there is a way to combine physical attacks and direct damage spells that results in a lethal, but not necessarily accounting for all side effects from attacking taunt minions that may heal their owners or otherwise.

      Should be used as a heuristic to temporarily increase the max depth and prune early end turn actions.

      Does not yet account for taunts or lifesteals.

      Parameters:
      context -
      playerId -
      target -
      Returns: