So last time I talked a bit about the four different models you can adopt for a “to-hit” mechanic. This time I’ll focus a bit on the ones involving randomness.

Let’s have a look at my d20 rolls over three nights of D&D, roll by roll.

Week 1

Week 2

Week 3

Looks pretty random, except if you look at the averages. All of them are below the expected average, Week 1 especially. Weeks 2 and 3 were close enough to the average to be okay, but there was definitely a bad moon on Week 1 :)

Here’s the frequency counts for the three weeks:

Frequency Counts

The frequency counts back up our feelings about the different weeks. You do get a feel for the weight of the distribution for the three weeks:

  • Week 1 is heavily weighted on 1, 3 and 6.
  • Week 2 is mostly uniform except for the huge peak at 12.
  • Week 3 is mostly uniform for 1-10, and heavily biassed towards 19 and little else in 11-20.

How did it feel around the table? Well, even around the table the other guys were cursing my bad luck in Week 1, explaining how best to “make an example of the cursed d20 to the rest of them”. The exact same die produced all the data, so whether it had a bad week depends on whether you really prescribe to panpsychism or not.

On that note, which sequence of rolls is more likely: the first ten rolls of Week 1, Week 2 or Week 3?  They are all equally as likely. Since there are 10 rolls, you have a 1-in-20 chance of matching the first roll, 1-in-20 for the second, … This gives you a probability of [tex]20^{-10}[/tex] of getting that exact sequence of rolls[1. If you like numbers spelt in words, it’s about one in 10 trillion, which is roughly double the probability of picking a particular red blood cell out of a normal human. Alternatively, it’s roughly picking a particular star in our galaxy, if you’re allowed 30 goes.], regardless of what the rolls you had to match.

If you got tricked by this, you’re probably implicitly thinking of the likelihood of the frequencies, rather than the actual results. You can test this using Pearson’s [tex]\chi^2[/tex] test, which confirms your intuition: you’d expect something like Week 1’s frequency distribution 21.75% of the time, Week 2 about 70.36% of the time, and Week 3 about 88.24% of the time. The problem is that humans aren’t so good at doing [tex]\chi^2[/tex] tests in their heads and are distracted by a whole bunch of psychological biasses. For example, I was astounded at how many 1’s I got in Week 1. Sure it was five times more frequent than you’d expect but it had the added emotional impact that 1’s indicate spectacular failure in D&D. Therefore I had 5 times more spectacular failures than I would have expected. I wasn’t so thrown by the fact that I had even more middling failures (6 sixes when you’d expect about 1) because a 6 is much the same as a 7, and I got zero 7’s, so perhaps in my mind I averaged it out.

Humans are typically terrible at thinking statistically, especially when emotions are involved. MMORPG forums are full of this kind of discussion. They may think the random number generator (RNG) is against them personally. They may even blame the game makers of using bad RNGs. Folks like this tend to forget the awesome luck the very same RNG may have given them the day before. This is really an education thing, so we don’t want to dwell on it too long. But one thing we can take away as game designers is players can easily have runs of bad, average or good random number outputs. We need to design with this in mind.

Random Design

If we’ve locked into the random hit chance/random damage model, we want to have a think about what this means for our weapons.  Let’s take the bow I in my D&D campaign. It’s a +1 Darkwood Composite Longbow and I’m at the stage where it gives me +11 to-hit, and does 1d8+2 damage, if I don’t do anything fancy with it. How much damage would I expect on average to inflict with my bow if I hit someone? Well the average of a 1d8 is 4.5, so 1d8 +2 has an average damage of 6.5. Okay, so I’ll do 6.5 damage per round, if I hit something.

What about accounting for the random chance to hit? Now D&D’s rules are a little tricky. I have to roll 1d20, add my bonuses, subtract penalties and compare that number to the bad guy. If my result is greater than their Armour Class, I hit and roll for damage. There’s two complications, namely rolling 1 or 20 – critical fumbles or critical hits respectively (don’t include bonuses or penalties). There are funky rules for dealing with both, but let’s assume we use the simplest ones. A natural 1 guarantees zero damage. If I get a natural 20, I roll again. If that second roll still hits, you do triple damage. Otherwise, normal damage.

Given this tangle of rules, how much damage would I expect to deal a round? We still need a benchmark Armour Class, so let’s say I’m trading arrows with my evil doppleganger. He has AC of 21, so my attacks have to be stronger than a 21. I’m (initially) lazy at doing maths, so let’s simulate it!

Simulated attacks versus Evil Doppleganger

The average damage per round here was about 3.6. This means the whole “roll to hit” thing is reducing my damage to a little more than half. Notice, however, the average “hit” value (my clumsy term for the damage done when you definitely hit) is much higher – 7.8, which is over double the damage. What’s the effect of taking out critical fumbles? Insignificant at this level[2. At this level of skill versus this opponent, rolling a 1 means a miss however you spin it. At higher skill levels or against weaker foes, it matter more. I’m also ignoring house rules that critical fumbles incapacitate you for some time, mostly because it’s too hard to model.] What’s the effect of taking out critical fumbles and critical hits? Average damage drops to about 3.2 per round, and the average “hit” value drops to 6.5.

What’s this mean? Well if I fight my doppleganger, I expect to kill him in about  16 rounds, since I have 57 HP and do about 3.6 damage per round. If there were no critical hits or misses, it’d take about 18 rounds. But that’s on average, right? What are all the possibilities? The fight could go on forever if I kept missing (and he took no shots at me). The fight could be over in two rounds if I manage to get critical hits for full damage for both. This is pretty broad and for game design, it’d be neat to have a bit more knowledge.

Here I’ve simulated combat for a great many combats.

Probability of killing doppleganger in N rounds

The orange curve is for killing in exactly N rounds, and the green curve for killing in at most N rounds. We can use the former to look at the spreads for individuals. Close to no-one will get the coveted 2-hit kill, but the probability ramps up quickly, peaking at 16 rounds. At the other end of that curve, you can see that maybe 1% of people will have to go at least 28 rounds before killing him.

The green curve helps us look at spreads for the population. While the most frequent kill time is 16 rounds, about 40% of people will actually kill the doppleganger in 16 or fewer rounds. 60% of people will be done in 20 rounds, and the vast majority of people will have finished in about 35 rounds. Although the top seems to level off at one, it’s not strictly 100% (because of rounding). If you have a large number of players, or a lot of fights with this doppleganger, there will be a significant number of people who might take 50 rounds of combat, just because of bad dice rolling.

Suppose I want to investigate the effect of a certain buff. In this case, we have “Deadly Aim”, a special ability that lets you trade some accuracy for increased damage. It gives a -2 penalty to-hit, but +4 to damage. Let’s have a look:

Probability of killing doppleganger in N rounds with Deadly Aim

The peak has shifted down to 12 rounds, and it’s a tiny bit more frequent, so we’d expect to kill him faster. We also note that 50% of people will kill him in at most 16 or fewer rounds, which is a nice improvement. Messing with the penalties and damage bonuses can give you nice graphs in which you can base your game design on. With tiny tweaks you can speed up combat, slow it down, reduce the chance of people blowing through it with lucky rolls, or mitigate against unlucky rolls.

If you wanted to do this sort of analysis yourself (with maths and not just Monte Carlo simulations), then you should learn a little about Poisson distributions. I’d talk a bit more about the statistical analysis you can do, but I think I’ve blown my words and pretty graph quotas for the month ;)


Trackback

10 comments until now

  1. This is awesome.

    What’d you use to make the graphs/do the simulations?

  2. So for the graphs, I used the amazingly awesome Google Image Chart Editor. I heart it soooo much, despite its minor quirks.

    For the statistics (like the chi-squared test) I used R. It’s not for the faint-of-heart though. I found the documentation to be fairly terrible and unintuitive, but then again, I’m not a statistician.

    For the simulations I used Magma, just because it was easy to mock something up in. I also wrote a very short program in C++ (using the boost::random library) to do some of the simulations and statistics.

  3. Nerrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr*gasp*rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrd!

  4. @Thom
    I think someone’s got a bit of graph envy ;)

  5. BrettW,

    If you keep this up you may get some requests from the other players regarding what needs modelling next – Morgan would like you to look at monsters and their saving throws vs his spells, where I reckon Andrew’s playing funny buggers with the initiative so I always end up going last.

  6. You’re telling me! Rainor is a super-crazy dex monkey, but I tend to go somewhere in the middle-to-end of initiative.

  7. With respect to to-hit rolls, it doesn’t really make much sense to take an average of the numerical value. Each facet of the die has a pretty much arbitrary value – adding them kaes as much sense as assigning numbers to types of fruit and averaging them.

    What would make sense is to pick a facet and then assign to each other facet a value based on its nearness to the selected facet. But what if the unrandomness was sort of bi-polar: if numbers were more likely to appear if they were near some axis of the die, rather than some particular end of the die? I’m thinking fourier analysis with spherical harmonics – adjusted for the fact that the die has icosahedral symmetry.

  8. That’s an okay observation, but dice typically have mitigating factors against this. The numbers aren’t randomly assigned to a face, but are chosen specifically to smooth out any small physical bias there may be. On any good die*, if you add the face value with the one on the exact opposite face it’ll sum to the highest value plus 1. That is, the 20 is opposite the 1, 19 is opposite 2…

    Thus if a die has a bias to a particular face, it simultaneously has a bias to its opposite (because you have an axis of bias normal to the face, which is opposite to another since it’s a Platonic solid and if you’re biassed to balance towards one face, you must do so for the other unless things are really skewed). This means on average the bias doesn’t give you any advantage or disadvantage**. This takes care of your fancy pants Fourier analysis ;)

    If your die is so loaded that it even has a bias to a particular end of this axis, then it’ll be pretty apparent. There are tests you can do to show how likely a die is to being biassed. While I don’t have enough rolls to verify this, the chi-squared tests show things probably aren’t really that bad.

    But yes, while I can create the random variable (namely the face value), I’m implicitly assuming it’s uniformly distributed ( I put on my Bayesian hat and thus it is so :) ) I calculate the expected value (which I called the average, for simplicity’s sake). This still has meaning, but if the die does not have uniform distribution, then my predictive powers are limited.

    (* d4s are a special case but no-one likes d4s anyway)
    (** Assuming numbers are nothing special. A 4 is the same as a 5 if you need a 3 or more to hit. Criticals throw this out because they are special.)

  9. Somewhat off topic, where can I purchase the aforementioned Bayesian hat? I imagine Richard would like one to celebrate his PhD ;)

  10. How did I miss this post? I like it a lot, and delving into the mathematics behind gameplay mechanisms is definitely something that I need to look into more.

    However, what I’d love to see if a comparison of different distributions. Let’s compare 1d20 to 2d10 or even 4d5? Furthermore, what degree of randomness makes a system “fun” versus “frustrating”, or is there a point at which a random system has so little variation that it may as well be an “always hit” system?