C.A.R.M.E.N.

Welcome to the introduction to the Card Appraisal and Rating Machine ENgine, also known as my friend Carmen, whom I’m teaching card games. Hearthstone in particular for the time being, though I look forward to the day where we might attempt to tackle Magic: the Gathering. The goal of this project is to teach her enough about card games that she can learn to evaluate cards better than any human. This can have very deep implications, from improving our understanding of card games, to making accurate predictions about card quality.

In order to accomplish this, I will have to develop a working theory of card games and refine it over the course of the project, to the point where it’s as accurate as it possibly can be. For the foreseeable future I’ll be working with Hearthstone, for a few reasons – it’s much simpler than Magic (no offence HS, but it’s true, and it’s not a bad thing), it will make evaluating cards easier, and also… I play it a lot. With that out of the way, the first order of business is to develop a model for constructing the real model, and this is the part that will begin to form the card games section of AGD.

Now then, to form a good working theory for Hearthstone, first we have to understand the field, as well as why every single attempt at modelling has failed so far. I could tell you the reason – no theory that we have seen up to this point has been complex enough to match the complexity of the game it’s trying to understand, and therefore it has failed (when we look at the existing models we’ll see this in full force, from simplistic star-based systems, to more advanced theories). That being said, models for abstraction are nothing new, in fact pretty much the whole field of game design is based on it, but because it’s an abstraction, something will always be lost in the process. My goal is to minimise that loss to the point where, hopefully, it becomes statistically insignificant.

Currently, I theorise that an accurate model can be built with the following steps:

Using purely deductive reasoning, we can create a simple base from which to start.
From this base, in combination with data, we can inductively formulate higher order theories.
It’s important to clearly differentiate between perfect and imperfect data and not assume its validity – in fact all data should be impossible to inductively disprove.
We need a feedback loop in order to calibrate thusly constructed theories – an ever-increasing sound baseline will “purify” the data, which will in turn help construct increasingly sound models.

Basically what this is saying is that while we can make assumptions, we have to adjust them based on the evidence, and also carefully think about which evidence is logically sound.

Suppose our engine rates a certain card particularly high, and it is then not played, even in a deck where we predicted it should be played, what’s happening there? Well, the truth is, we can’t know for certain. The configuration of competitive decks is largely formed by lists that “emerge” throughout a particular meta, and they are always built by somebody. Even if the archetype as a whole has many pilots, eventually one particular version is selected based on an imperfect metric (usually that’s a particularly high finish, such as #1 Legend in a season), and development continues from there. Well, if the original selection method is imperfect and almost all other versions of a deck are somewhat thrown to the side by most players (thankfully, not all) because of the fallacious assumption that this version is the “best”, then these other versions stop getting “developed”, in a sense.

We can trivially prove that the selection method is imperfect by postulating the following – “There are many factors that contribute to a #1 Legend finish, one of which is the precise deck configuration”. In fact, I am willing to make the following conjecture – “Not only is the exact 30-card configuration not the driving factor, it may even be possible that the deck archetype as a whole is not the driving factor behind this finish”. Why? Well, Hearthstone is a game of odds and percentages, with very decks and matchups (barring extremely polarised metas) being so skewed to one side that one deck can never take a game off the other. In that case, even if you are playing a deck with 55% win rate against the overall field, all you need to do is play more games than everybody else, within some order of magnitude. It’s nearly impossible to say that you wouldn’t have had a higher win rate if you had made a change to your 30-card configuration, since you had no reason to do so given your positive win rate (because a change could also bring the win rate down). Because of that, we can have #1 Legend finishes with suboptimal decks, and it’s almost impossible to guarantee that a better version of the deck didn’t exist.

Now, this is somewhat mitigated by a higher order understanding of Hearthstone decks, where a sufficiently skilled player can differentiate between core and tech cards, and adjust from a #1 Legend deck accordingly, but the fact remains that in a purely deductive sense, the data is imperfect. Given that a lot of these players work with bad models (which we will prove are bad in the next post) and then build and tech decks based on their own fallacious ratings of cards, it follows that they are not always playing the best decks.

The first steps I’ll be taking in order to form the initial postulates of the theory will be to create a method of abstraction that is sufficiently complex as to match Hearthstone’s depth, examine and analyse how and why other models fail (hopefully gaining some insight into what to do and what not to do), and develop a rudimentary rating system that will increase in accuracy through refinement (as well as an interpreter for Hearthstone cards that can translate their content within the confines of that system).

With all of this in mind, from here on out I’d like to make an overview of what the major milestones of the system would be, and what a final version of Carmen would look like.

1. To start off simply, I have to teach Carmen the absolute most basic elements of the cards I want her to rate. First in isolation and without context, because complexity scales exponentially from there, I want her to know what the core elements of a card are (that are true for any Hearthstone card) and I will place them here in what I theorise is the order of importance – effect(s), type, class, mana cost, and stats.

2. Since this is the first order of business, the naturally following step is to codify each of these elements into a rudimentary scale, so that the first version of the rating system is born. Spoiler alert: I have already done that and have been working on it for the past year or so, and in the initial posts I will simply be writing about my findings.

3. One these are established, the next logical step is to link them together, so that a card may be examined as a whole, still in vacuum of course. This is what will give us a card’s adjusted power level, which will allow us to compare different cards. Theoretically, a deck wants to be comprised of the 30 highest rated cards it has access to (we’ll get to that at the end of this post).

4. The next and hardest step is to examine a card in context. Currently I have theorised three main sections in the form of axis of interaction, class context, and synergies. Even though this step comes later in the process, it’s where pretty much any previous model has failed dramatically, and class context in particular is the biggest logical fallacy I have observed during card reviews.

4.1. There is an extra step to this that will be particularly hard to pull off, and that’s the context of the metagame, or “meta” for short. Unlike the rest, this context is dynamic and will change even without the influx of new cards into the game, and it can be particularly tough to nail this. As a dim light in the otherwise dark tunnel, metas are not a very complicated thing, so modelling them will not be as tricky as applying that model to the cards.

5. Finally, I’d like to make some sort of application that can dynamically apply context to cards based on selections, thus allowing the real practical application of all the theories – a tool that will improve deck building and conversely, our understanding of the game of Hearthstone in the process. For the foreseeable future, Carmen will be an abstract model, probably ported onto an interactive Excel sheet at the first possible moment, though that medium will be too limited for the final version.

Lastly, I’d like to touch on the theory behind why a rating system for cards is ultimately useful, in this postulate:

“With access to a theoretical perfect rating system, any one Hearthstone deck wants to play the 30 highest rated cards it has access to at a given time.”

There are some caveats here of course, so even though they should be obvious, I’d like to go over them so as not to be misquoted out of context some day. This theoretical perfect rating system can:

Evaluate cards on multiple metrics across power level, synergies, costs, etc.
Increase the score of cards based on synergies with other cards if they are together (this will make sure decks that play lots of cards with relatively low power but huge benefits when combined can still be found, ranging from tribal swarms to combo decks).
Account for pseudo-synergies based on axis of interaction (such as weighing very highly a draw engine when added to a deck with lots of cheap cards, or AoE buffs when added to a deck with lots of token generators).
Account for state of the metagame (such as rating Skulking Geist very highly in a meta where Jade Druid or Secret Paladin is Tier 1).
Find the most optimal “core” of cards based on power level, from which to start building.
Etc.

Or simply put – a sufficiently advanced and infallible rating system should be able to account for everything and modify card ratings accordingly. Is such a system even possible? Theoretically, it is, though I have incredibly strong reservations about how reasonable would it be to develop in practice. The goal is not to construct this dream system, but to improve our understanding of Hearthstone on the way there. Though C.A.R.M.E.N. will most likely never be that, getting as close as humanly possible is the desired outcome of this exercise.

Obviously, this is the ultimate goal, and a hardly achievable one. In the mean time, Carmen can do other things, such as allow is to rate cards from the upcoming expansions. Even without a fully interactive program to build decks for you, finding out which the strongest cards are will drastically help with getting the entire deck building process going. For this reason I call it an appraisal engine rather than an analysis one.

————————————————————————————————————————–

With that, the introduction to the concept behind Carmen is over. From here, I will continue to update this post with the links to all other articles about my progress in teaching her Hearthstone:

“Carmen says” is the series in which I run the revealed cards from each new expansion announcement through the system. This is mostly to get my thoughts on the cards out there, but the cool thing that can be done is to go back and re-evaluate the same cards, now within the context of the entire expansion once the last card is revealed.
Knights of the Frozen Throne
Rastakhan’s Rumble

Leave a Reply Cancel reply