I’m not sure if this is quite what you’re asking, but here’s how I think about these pull rates.
Short version:
For rare events (pulls), a good rule of thumb is that N events is not statistically distinguishable from N +/- sqrt(N) events. So if you pull 21 Charizard alt arts and 18 Arceus alt arts from 10,000 packs, you can’t really say that Arceus has a lower pull rate since sqrt(21) = 4.6, and 21-4.6 = 16.4 < 18.
Longer version:
Let me take the following example. Suppose that the true distribution of Charizard V alts is 1 in 500 packs. If you open 10,000 packs, how many Charizard V might you open? You expect to get around 10,000/500 = 20 – but how close to 20 is typical? If 20 different people each open 10,000 packs, how variable will their results be in the number of Charizards pulled?
To mathematically analyze this, we’re going to model this experiment as a binomial random variable. Basically, we assume that every pack is independent from one another, and each time you open a pack there is a p=1/500 chance of pulling the Charizard. (So it is technically possible to pull 0,1, or 10,000 Charizards in this model. This assumption may not be strictly true for packs from the same box or case.)
Such a process can be very efficiently simulated on a computer. Doing this 10,000 pack experiment 20 times, I got the following results:
of Charizards = [26 25 16 15 15 16 22 22 25 16 17 17 16 28 17 20 15 31 14 23]
This is better displayed graphically, as this bar chart:
As you can see, the results are all around the expected N_charizard = 20, but there’s a fair spread, and the most common outcome in this case is actually N_charizard = 17.
Let’s do this simple experiment many more times: give 100,000 people 10,000 packs each. With this large-scale (and large hypothetical budget!) investigation we can see a smoother curve emerge:
The second plot here is the same but with the y-axis on a log scale, so you can see the rare outcomes where someone opens 5 or 41 Charizards from their 10,000 packs.
As N_experiments increases, this frequency plot looks more and more like the theoretical binomial distribution
This is very similar to what’s above – only thing I want to point out here is the 2 in a billion chance of pulling no Charizards at all. So if you were that poor schmuck with 0 Charizards, it would seem very unlikely that it has a 1/500 pull rate! (Could all your packs be from an error batch where the factory forgot to print it?
)
All I’ve written about above is from the omniscient (manufacturer) perspective: given some fixed, known pull rate p=1/500, how many Charizards is someone likely to open from 10,000 packs? From the scientific (consumer) perspective you might care more about a slightly different question: if I open 21 Charizards from 10,000 packs, how well can I determine the pull rate?
In statistical parlance, you want an estimate for both a most-likely value p0, and a confidence interval about that most-likely value. That is, you want to be able to say with some confidence that p is between p0-Delta and p0+Delta. (Say, between 0.015 and 0.027 for our example.)
For a binomial distribution, the “best guess” p0 is just the observed rate (21/10,000 = 0.0021 here). The confidence interval depends on how confident you want to be - are you willing to tolerate a 30%, 5%, or .3% (or lower!) chance that you’re wrong about the true value of p? You can read this wikipedia page for more details, but a good approximation when working with the binomial distribution is:
Delta ~ z * Sqrt[N_charizard * N_notcharizard ] / (N_packs * Sqrt[N_packs]) ~ z * Sqrt[N_charizard]/ N_packs
The constant z is a number of your choice (1, 1.5, 2…) determined by how confident you want to be (70% for z=1, 95% for z=2, etc. – look up “z score table” for how to pick this).
So given 21 Charizards pulled from 10,000 packs, we get Delta = z * 0.00046 = z * 4.6/10,000. Then we can say with 70% confidence that p is between 16/10,000 and 26/10,000, or 95% confidence that p is between 12/10,000 and 30/10,000.
(If you pull exactly 0 or very nearly 0 Charizards, then this formula is no longer reliable and you have to use something else, such as the “Clopper–Pearson interval.” Just for reference, for 10,000 packs the 95% confidence interval in the 0-pull case is 0 < p < 3.7/10,000.)
If you want to determine whether the difference between the pull rates for two different cards (say, Arceus alt art and Charizard alt art) is “statistically significant,” that’s a related topic called “hypothesis testing” that I leave the details of to a future post (or to someone else
). But the rule of thumb I mentioned at the start is my go-to.
For the Fusion Strike tcgplayer chart, I am pretty certain that they opened exactly 4320 packs. (This is the least common multiple of 1080 and 1440, and all the other pull rates give nice round numbers of cards pulled.) For all the cards in that chart, here is a plot of their pull rates and confidence intervals (calculated with the more precise Clopper-Pearson formula):
Here the dark grey rectangles are 70% CI, and light grey are 95%. As you can see, the ranges of likely pull rates are all highly overlapping at this sample size. But the most rare (VMAX Alt Art) do seem significantly distinct from the least rare (Elesa).
For the Brilliant Stars chart, they opened 10,800 packs (300 booster boxes), with the corresponding pull rates and confidence intervals




