Statistics Sample Size

1991TwinsSIcover

With movies like Moneyball and organizations like Baseball Prospectus pushing sabermetrics and statistics, armchair statisticians have popped up everywhere. For the most part these self proclaimed stats experts read ESPN and write what they think those “statistics” mean. Well ESPN is notorious for being biased and less than adequate in their statistics department, so these armchair statisticians really don’t know what they are talking about.

At the beginning of every season we hear the same thing over and over again; “He hit 5 HRs in 5 games, he’s going to hit 162 HRs” or “He is 0-for his first 30 ABs, looks like he is going to have a bad year”. “The first few games of the season are indicators for the entire season.” People who say things like that could not be more wrong.

So let’s take a look at what point statistics become reliable.

Offense Statistics:

  • 50 PA (Plate Appearances): Swing %
  • 100 PA: Contact Rate
  • 150 PA: Strikeout Rate, Line Drive Rate, Pitchers/PA
  • 200 PA: Walk Rate, Ground Ball Rate, Fly Ball Rate, Ground Ball/Fly Ball
  • 300 PA: Home Run Rate, HR/Fly Balls
  • 500 PA: OBP, SLG, OPS

Pitching Statistics:

  • 150 BF (Batters Faced): K/PA, Line Drive Rate
  • 200 BF: Ground Ball Rate, Fly Ball Rate, Ground Ball/Fly Ball Rate
  • 500 BF: K/BB, Pop-up Rate
  • 550 BF: BB/PA

If you don’t believe us, go ahead and look for the past ten years at end of the season numbers vs each of these points in any player’s season. You’ll see what we, FanGraphs, Pizza Cutter, and Baseball Prospectus have all seen. If you still don’t believe us, close ESPN or Yahoo or CBSports or whatever “source” you are looking at and look at raw statistics. Once you do that look at a team that you are indifferent towards, say the Kansas City Royals, assign each player a random number, and look at the statistics of each number, this way any chance for bias is eliminated.

The first couple of series do not, in any way, determine how the season is going to go for the team or an individual player. The numbers above and minimum numbers. The absolute minimum. If you’re favorite player has 0 HRs through 10 games don’t freak out, the sample size is way to small. The same holds true for a prospect. If a prospect gets called up at some point during the season and they hit 2 HRs in the first 2 games, that doesn’t mean that they are the next Hammerin Hank, the sample size, again, is way to small. So please, before you freak out and because of a slow start, remember that the season is young, and the sample size is still very small.

Leave a comment