Class GameStateValueBehaviour
- All Implemented Interfaces:
Serializable
,Cloneable
,Behaviour
The objective of the bot is to determine the best possible action from the list of actions it can choose from, given the current state of the game. In the AI community, this is called a "policy," and it is typically represented in the literature using the Greek letter Pi.
Policies take as inputs a list of actions, like "Friendly hero attacks opposing hero," and the current state of the
game, in this case all the data available in GameContext
, and returns an action as an output. requestAction(GameContext, Player, List)
corresponds to this AI's policy; when implementing an idea, especially
from the literature, start with requestAction(GameContext, Player, List)
.
In this AI, requestAction(GameContext, Player, List)
tries each action available to take and scoring the
outcome. The score, in this case, depends on the game state. The action that leads to the highest score is the action
returned by requestAction(GameContext, Player, List)
. Since card games typically involve combos (sequences
of actions), this AI will actually play out all available actions until the end of its turn, seeking which starting
action will maximize its score at the **end** of its turn.
Therefore, the best way to describe this AI is: It is a "single turn horizon" AI. That is, it tries to pick actions to maximize a score by the end of the turn.
Playing around secrets is difficult without a long-term vision of the game, so enemy secrets are omitted from the simulation entirely. The bot's and opponennt's start turn effects are heuristically triggered at the end of the turn.
How does the AI do scoring? Clearly, it can't be as simple as, "The highest score is whatever reduces the opponent's
health the most." Indeed, this class uses a complex model for a score, called a Heuristic
, which is capable
of looking at any factor in the game state to contribute to the score. The specific heuristic used by this class is
the ThreatBasedHeuristic
. Visiting that class, it's clear that things like holding onto hard-removal cards,
or fully destroying minions, contribute greatly to the score, along with the heroes' and minions' attack and
hitpoints.
In order to understand the various tradeoffs between actions, the way the Heuristic
is calculated should
somehow reflect the actual ability of these actions to lead to victory. Clearly, the score for an action should, in
an ideal world, be as simple as, "Whatever maximizes my chance of winning." But for the time being, answering that
question is computationally impossible in this card game. Instead, GameStateValueBehaviour makes the assumption that
across many games, maximizing some score at the end of my turn maximizes my chance of winning on average. Hearthstone
generally rewards great tactical play, so this is a surprisingly robust assumption. But it isn't necessarily true for
many games or for all cards in this game. However, this assumption makes this AI fast, so it is preferred in this
context.
The ThreatBasedHeuristic
tries to maximize the chance of winning by somehow relating its scoring mechanism to
the actual outcome of a match. The Cuckoo application in the cluster package is the system that tweaks the
scoring function in order to choose tweaks that corresponded to greater wins in the game. This approach makes
GameStateValueBehaviour the best delivered AI in the Hearthstone community.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
This helper class stores a list of choices from an intermediate node expansion. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
static final int
static final int
static final int
protected boolean
protected boolean
protected FeatureVector
protected boolean
protected Heuristic
protected long
protected int
protected long
protected String
protected boolean
protected boolean
protected boolean
protected long
protected Deque<GameAction>
protected int
protected boolean
protected boolean
protected long
protected boolean
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionclone()
Clones the behaviour, typically with its internal state.protected void
evaluate
(Deque<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> contextStack, int playerId, net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node node, GameAction action, int depth) Evaluates the provided game state with the provided action, then appends a new game state with potential actions to thecontextStack
.protected GameAction
exceptionActionChoice
(Optional<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> maxScoreSoFar, @NotNull List<GameAction> gameActions) Gets the best scoring action, the end turn action or the first action in the list.protected GameContext
getClone
(GameContext original) Returns a clone of the game context, assuming the opponent is aGameStateValueBehaviour
too.The index plan is a sequence of indices intoGameContext.getValidActions()
that the bot can perform to go towards a previously-computed highest-scoring game state.long
Indicates the amount of time available to this instance to find lethal, when lethal is probably on the board.int
Indicates the maximum depth of breadth-first-searched nodes that should be expanded in order to find the highest scoring game state.long
Gets the minimum observed free memory recorded during the execution of this instance.getName()
Gets a name for the behaviour.protected long
Records a call toSystem.currentTimeMillis()
at the start of a call torequestAction(GameContext, Player, List)
.A strict plan is a cache of a computed path (sequence of actions) to a gamestate stored as the actions themselves.int
When pruning withisPruneContextStack()
, sets the maximum size of the context stack (the number of game states left to expand).long
Just below the maximum amount of time in milliseconds the bot will spend per call torequestAction(GameContext, Player, List)
to determine its sequence of actions.boolean
Indicates whether or not nodes should be "disposed" (their game context references set tonull
.boolean
Indicates whether thegetMaxDepth()
setting should be temporarily expanded to the number of actors and playable cards + 1 (the end turn action) in order to help the system find a way to lethally destroy a target.boolean
Indicates whether or not a garbage collection call viaSystem.gc()
should be made whenever a game context is done being processed.protected boolean
Checks if the bot has timed out or if the thread it is executing on is interrupted.boolean
Indicates whether or not the bot should process / expand nodes in its game state tree expansion using multiple threads.boolean
Indicates whether or not this class should make attempts to prune the "context stack," or game states left to expand, in order to save memory.boolean
Indicates if end turns should only be evaluated if they are the only action availableboolean
Throws an exception if an invalid plan is encountered.boolean
Indicates if start turn effects should be evaluated at the end of the bot's turn.mulligan
(GameContext context, Player player, List<Card> cards) Mulligans for cards, preferring to create an on-curve starting hand.static boolean
observesLethal
(GameContext context, int playerId, Actor target) Determines whether a combination of physical attacks, weapons and direct damage spells can give the player lethal against its opponent.protected void
postProcess
(int playerId, GameContext context) Post-processes a game state for scoring.protected void
pruneContextStack
(Deque<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> contextStack, int playerId) Prunes the context stack to save memory.@Nullable GameAction
requestAction
(@NotNull GameContext context, @NotNull Player player, @NotNull List<GameAction> validActions) Requests an action from the GameStateValueBehaviour using a scoring function.protected @Nullable GameAction
Saves the plan and retrieves the game action from the max scoresetDisposeNodes
(boolean disposeNodes) setExpandDepthForLethal
(boolean expandDepthForLethal) setForceGarbageCollection
(boolean forceGarbageCollection) void
setIndexPlan
(Deque<Integer> indexPlan) setLethalTimeout
(long lethalTimeout) setMaxDepth
(int maxDepth) setParallel
(boolean parallel) setPruneContextStack
(boolean pruneContextStack) setPruneEarlyEndTurn
(boolean pruneEarlyEndTurn) protected GameStateValueBehaviour
setRequestActionStartTime
(long requestActionStartTime) void
setStrictPlan
(Deque<GameAction> strictPlan) setTargetContextStackSize
(int targetContextStackSize) setThrowOnInvalidPlan
(boolean throwOnInvalidPlan) setThrowsExceptions
(boolean throwsExceptions) setTimeout
(long timeout) setTriggerStartTurns
(boolean triggerStartTurns) boolean
Indicates this game state value behaviour should throw exceptions when its underlying assumptions about the mechanics of the game are violated.Methods inherited from class net.demilich.metastone.game.behaviour.IntelligentBehaviour
isHuman
Methods inherited from class net.demilich.metastone.game.behaviour.AbstractBehaviour
mulliganAsync, onGameOver, requestActionAsync
-
Field Details
-
DEFAULT_TARGET_CONTEXT_STACK_SIZE
public static final int DEFAULT_TARGET_CONTEXT_STACK_SIZE- See Also:
-
DEFAULT_MAXIMUM_DEPTH
public static final int DEFAULT_MAXIMUM_DEPTH- See Also:
-
DEFAULT_TIMEOUT
public static final int DEFAULT_TIMEOUT- See Also:
-
DEFAULT_LETHAL_TIMEOUT
public static final int DEFAULT_LETHAL_TIMEOUT- See Also:
-
heuristic
-
featureVector
-
nameSuffix
-
timeout
protected long timeout -
strictPlan
-
indexPlan
-
maxDepth
protected int maxDepth -
minFreeMemory
protected long minFreeMemory -
disposeNodes
protected boolean disposeNodes -
parallel
protected boolean parallel -
forceGarbageCollection
protected boolean forceGarbageCollection -
throwOnInvalidPlan
protected boolean throwOnInvalidPlan -
pruneContextStack
protected boolean pruneContextStack -
throwsExceptions
protected boolean throwsExceptions -
expandDepthForLethal
protected boolean expandDepthForLethal -
triggerStartTurns
protected boolean triggerStartTurns -
pruneEarlyEndTurn
protected boolean pruneEarlyEndTurn -
lethalTimeout
protected long lethalTimeout -
targetContextStackSize
protected int targetContextStackSize -
requestActionStartTime
protected long requestActionStartTime
-
-
Constructor Details
-
GameStateValueBehaviour
public GameStateValueBehaviour() -
GameStateValueBehaviour
-
-
Method Details
-
getClone
Returns a clone of the game context, assuming the opponent is aGameStateValueBehaviour
too.- Parameters:
original
- The original game context to use.- Returns:
- The clone.
-
clone
Description copied from interface:Behaviour
Clones the behaviour, typically with its internal state.- Specified by:
clone
in interfaceBehaviour
- Overrides:
clone
in classAbstractBehaviour
- Returns:
- A clone of the
Behaviour
instance.
-
getMaxDepth
public int getMaxDepth()Indicates the maximum depth of breadth-first-searched nodes that should be expanded in order to find the highest scoring game state.Setting this depth higher exponentially increases the number of nodes that could get visited for evaluating potential game state scores.
Setting this depth too low will make the bot miss lethal, especially if it has to use more than
maxDepth
cards or attack with more thanmaxDepth
minions in order to kill the bot's opponent.The default value on the hosted version of Spellsource is
2
. For a good compromise between performance and finding the most commmon lethals, choose5
.- Returns:
- The currently configured maximium depth.
-
setMaxDepth
-
throwsExceptions
public boolean throwsExceptions()Indicates this game state value behaviour should throw exceptions when its underlying assumptions about the mechanics of the game are violated. For example, this will cause the GSVB to throw an exception if it is requested to evaluate discover actions directly.- Returns:
true
if operating in debug mode.
-
setThrowsExceptions
-
getName
Description copied from interface:Behaviour
Gets a name for the behaviour. This should correspond to how the decisions are being made, e.g., a"Human Behaviour"
or an"AI Behaviour
.- Returns:
- A
String
description of the behaviour.
-
getStrictPlan
A strict plan is a cache of a computed path (sequence of actions) to a gamestate stored as the actions themselves.Whenever you call
requestAction(GameContext, Player, List)
, the instance ofGameStateValueBehaviour
evaluates sequences of actions of length maximumgetMaxDepth()
, and scores the value of the last game state (i.e. the game state you arrive at after performing that sequence of actions). But therequestAction(GameContext, Player, List)
method returns the first action in that sequence.Clearly, the sequence of best actions isn't going to change before and after you take the
GameAction
that was returned by the first call torequestAction(GameContext, Player, List)
. ThisDeque
stores the sequence that was computed as a side effect ofrequestAction(GameContext, Player, List)
. With it, the next call torequestAction(GameContext, Player, List)
doesn't have to recompute a whole sequence of actions every time; it can use whatever is left of the sequence of actions that led to the best scoring state.Since game states are reproducible, and this behaviour "cheats" (it knows what the random seed is), there should be an exact match between the
Deque.peekFirst()
'dGameAction
in this plan and a game action returned byGameContext.getValidActions()
until the plan has been exhausted (i.e. the plan isCollection.isEmpty()
== true
).Because the API of a
GameContext
does not guarantee that aGameAction
has no references to theGameContext
or its objects, this class also implements agetIndexPlan()
, which uses integers to represent an index intoGameContext.getValidActions()
.For example, this code will "follow the plan" that was computed as a side effect of running
requestAction(GameContext, Player, List)
.while (!getStrictPlan().isEmpty()) { context.performGameAction(playerId, getStrictPlan().pollFirst()); }
- Returns:
- A path of actions (state transitions) towards the highest scoring game state.
- See Also:
-
setStrictPlan
-
mulligan
Mulligans for cards, preferring to create an on-curve starting hand.- Specified by:
mulligan
in interfaceBehaviour
- Overrides:
mulligan
in classIntelligentBehaviour
- Parameters:
context
- The game context.player
- The player who's mulliganing.cards
- The cards in the player's first hand.- Returns:
- A list of cards to discard.
-
requestAction
@Nullable public @Nullable GameAction requestAction(@NotNull @NotNull GameContext context, @NotNull @NotNull Player player, @NotNull @NotNull List<GameAction> validActions) Requests an action from the GameStateValueBehaviour using a scoring function. This method uses a cache of what it has computed before if it is provided withsetIndexPlan(Deque)
orsetStrictPlan(Deque)
.Suppose the board looked like this:
Opponent: Warrior, 7 HP, no cards in hand, no minions on the board. This Player: Mage, 30 HP, a Fireball in the hand, 6 mana.
Clearly, this player has lethal: Fireballing the opponent, followed by Fireblasting the opponent, will win the game. There are two sequences of actions in this case that win the game. How does this function wind up returning the correct actions twice in order to win the game?
First, at the beginning of the turn, the request action function receives the above game state, followed by the possible actions:
1. Fireball opponent. 2. Fireball yourself. 3. Fireblast opponent. 4. Fireblast yourself. 5. End the turn.
Suppose we scored each action with a simple function: "1 point if the enemy hero is destroyed, otherwise 0."
1. Fireball opponent = 0 points. 2. Fireball yourself = 0 points. 3. Fireblast opponent = 0 points. 4. Fireblast yourself = 0 points. 5. End the turn = 0 points.
By just looking at the current actions, it's impossible to see that fireballing or fireblasting your opponent will lead to victory, even though the scoring function ought to work fine in this particular case.
What if instead we chose an action based on the score of the state at the end of the SEQUENCE of actions that particular action can enable? If we expand all the possible actions given our choices, we get:
1. Fireball opponent = 0 points. 1. Fireblast opponent = 1 point. **1. End turn = 1 point.** 2. Fireblast yourself = 0 points. 1. End Turn = 0 points. 3. End turn = 0 points. 2. Fireball yourself = 0 points. 1. Fireblast opponent = 0 points. 1. End turn = 0 points. 2. Fireblast yourself = 0 points. 1. End Turn = 0 points. 3. End turn = 0 points. 3. Fireblast opponent = 0 points. 1. Fireball opponent = 1 point. **1. End turn = 1 point.** 2. Fireball yourself = 0 points. 1. End Turn = 0 points. 3. End turn = 0 points. 4. Fireblast yourself = 0 points. 1. Fireball opponent = 0 points. 1. End turn = 0 points. 2. Fireball yourself = 0 points. 1. End Turn = 0 points. 3. End turn = 0 points. 5. End the turn = 0 points.
When expanding all the possible actions, there are now two sequences of actions that end with 1 point.
This function will return the FIRST action in the sequence that terminates with the highest score at the end of the turn. In this example, it will return either action 1 (Fireball opponent) or action 3 (Fireblast opponent).
The scoring function is much more complicated, but in broad strokes it works the way as described above.
- Parameters:
context
- The game context where the choice is being made.player
- The player who is making the choice.validActions
- The valid actions the player has to choose from.- Returns:
- The action that maximizes the score of the state of the game at the end of this player's turn.
- See Also:
-
savePlan
@Nullable protected @Nullable GameAction savePlan(Optional<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> maxScore) Saves the plan and retrieves the game action from the max score- Parameters:
maxScore
-- Returns:
-
isInterrupted
protected boolean isInterrupted()Checks if the bot has timed out or if the thread it is executing on is interrupted.- Returns:
-
exceptionActionChoice
protected GameAction exceptionActionChoice(Optional<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> maxScoreSoFar, @NotNull @NotNull List<GameAction> gameActions) Gets the best scoring action, the end turn action or the first action in the list.- Parameters:
maxScoreSoFar
-gameActions
-- Returns:
-
pruneContextStack
protected void pruneContextStack(Deque<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> contextStack, int playerId) Prunes the context stack to save memory. Removes terminal nodes that are not worth exploring heuristically.- Parameters:
contextStack
-playerId
-
-
evaluate
protected void evaluate(Deque<net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node> contextStack, int playerId, net.demilich.metastone.game.behaviour.GameStateValueBehaviour.Node node, GameAction action, int depth) Evaluates the provided game state with the provided action, then appends a new game state with potential actions to thecontextStack
. This expands the game tree by one unit of depth.If rolling out the specified action leads to calls to
GameLogic.requestAction(Player, List)
, like a discover or a battlecry request, this method will breadth-first-search those intermediate actions until it gets to a non-intermediate game state (i.e., one with all of the intermediate action requests answered).For example, consider the card: "Choose between two: 'Do nothing', and: 'Choose between do nothing and win the game.'" The action is to play this card. The following additional combinations of intermediate actions are created:
1. Discover and cast do nothing. 2. Discover and cast 'Choose between do nothing and win the game.' 1. Discover and cast 'Do nothing.' 2. Discover and cast 'Win the game.'
Clearly, we want the bot to perform the following sequence of actions: Play this card, then make choices #2, #2, because that will win the game.In order to choose that path without emitting intermediate nodes onto the
contextStack
, this function queues these intermediate actions and restarts from the beginning, evaluating a particular sequence it queued. Eventually, there is a sequence of actions queued that includes "play this card, make choice #2, then make choice #2," and since that sequence terminates into a non-intermediate game state, that sequence and the resulting game state are queued as a node onto thecontextStack
.This optimization only applies to the particular architecture of Spellsource.
- Parameters:
contextStack
- The stack of contexts onto which this function should append rolled-out game states.playerId
- The player ID of the player whose point of view we're computing this rollout.node
- The node (i.e., game state) from which the specified action should be rolled out.action
- The action to roll out.depth
- The current depth of this rollout. This is the count of non-intermediate actions from the game state thatrequestAction(GameContext, Player, List)
was called with.
-
postProcess
Post-processes a game state for scoring.Currently, this method triggers turn start effects on both sides of the battlefield.
- Parameters:
playerId
-context
-
-
setIndexPlan
-
getIndexPlan
The index plan is a sequence of indices intoGameContext.getValidActions()
that the bot can perform to go towards a previously-computed highest-scoring game state. It is essentially a cache of a prior computation of the best possiblegetMaxDepth()
number of actions.For example, this code will "follow the plan" that was computed as a side effect of running
requestAction(GameContext, Player, List)
.while (!getIndexPlan().isEmpty()) { context.performGameAction(playerId, context.getValidActions().get(getIndexPlan().pollFirst()); }
- Returns:
- Indices into
GameContext.getValidActions()
- See Also:
-
getTimeout
public long getTimeout()Just below the maximum amount of time in milliseconds the bot will spend per call torequestAction(GameContext, Player, List)
to determine its sequence of actions. This time tends to be amortized overgetMaxDepth()
-length action sequences, because this behaviour caches the calculation of a complete sequence.- Returns:
- The time to spend per request, in milliseconds
-
isDisposeNodes
public boolean isDisposeNodes()Indicates whether or not nodes should be "disposed" (their game context references set tonull
.On some JVMs, this may help the garbage collector find finished game states faster.
- Returns:
-
setDisposeNodes
-
isForceGarbageCollection
public boolean isForceGarbageCollection()Indicates whether or not a garbage collection call viaSystem.gc()
should be made whenever a game context is done being processed.This will slow the bot down but may reduce overall heap usage.
- Returns:
-
setForceGarbageCollection
-
setTimeout
-
isParallel
public boolean isParallel()Indicates whether or not the bot should process / expand nodes in its game state tree expansion using multiple threads.When using
GameContext.simulate(List, Supplier, Supplier, int, boolean, AtomicInteger)
, typically this should be set tofalse
wheneveruseJavaParallel
istrue
.- Returns:
-
setParallel
-
isThrowOnInvalidPlan
public boolean isThrowOnInvalidPlan()Throws an exception if an invalid plan is encountered.When errors occur during replaying existing plans, there might be a game reproducibility issue where the same exact seed and sequence of actions did not produce the same exact results; or, there is an issue cloning game states.
- Returns:
-
setThrowOnInvalidPlan
-
getLethalTimeout
public long getLethalTimeout()Indicates the amount of time available to this instance to find lethal, when lethal is probably on the board.- Returns:
-
setLethalTimeout
-
isPruneContextStack
public boolean isPruneContextStack()Indicates whether or not this class should make attempts to prune the "context stack," or game states left to expand, in order to save memory.The pruning strategies may change.
- Returns:
-
setPruneContextStack
-
getTargetContextStackSize
public int getTargetContextStackSize()When pruning withisPruneContextStack()
, sets the maximum size of the context stack (the number of game states left to expand).- Returns:
-
setTargetContextStackSize
-
getMinFreeMemory
public long getMinFreeMemory()Gets the minimum observed free memory recorded during the execution of this instance.- Returns:
-
getRequestActionStartTime
protected long getRequestActionStartTime()Records a call toSystem.currentTimeMillis()
at the start of a call torequestAction(GameContext, Player, List)
.- Returns:
-
setRequestActionStartTime
-
isExpandDepthForLethal
public boolean isExpandDepthForLethal()Indicates whether thegetMaxDepth()
setting should be temporarily expanded to the number of actors and playable cards + 1 (the end turn action) in order to help the system find a way to lethally destroy a target.- Returns:
-
setExpandDepthForLethal
-
isTriggerStartTurns
public boolean isTriggerStartTurns()Indicates if start turn effects should be evaluated at the end of the bot's turn.- Returns:
-
setTriggerStartTurns
-
isPruneEarlyEndTurn
public boolean isPruneEarlyEndTurn()Indicates if end turns should only be evaluated if they are the only action available- Returns:
-
setPruneEarlyEndTurn
-
observesLethal
Determines whether a combination of physical attacks, weapons and direct damage spells can give the player lethal against its opponent.When this is
true
, there is a way to combine physical attacks and direct damage spells that results in a lethal, but not necessarily accounting for all side effects from attacking taunt minions that may heal their owners or otherwise.Should be used as a heuristic to temporarily increase the max depth and prune early end turn actions.
Does not yet account for taunts or lifesteals.
- Parameters:
context
-playerId
-target
-- Returns:
-