Statistical Methods
Once you’ve found the data, the answer may jump out at you. Team A has won 81% of its home games. It’s hosting Team B, which has won just 27% of its away games. A bet on Team A to win seems justified.
The problem, of course, is that the bookmakers have come to the same conclusion. You’re not going to make much with a bet on Team A to win if that’s the outcome everyone expects.
If you’re looking for big wins, you’ve got to look a bit deeper.
Regression Analysis
The first task in using statistics in sports betting is to figure out which statistics to track – that is, which statistics affect the outcome of the game. The tool for identifying the relevant statistics is called regression analysis.
Suppose you suspect that your team performs better at home than it does on the other team’s turf. You could make a simple bar chart. One vertical column is the percentage of away games that the team has won over the past two or three years, and the second column is the percentage of home games won.
If your theory about home-field advantage is correct, the second column should be taller.
Here comes the statistical analysis.
Draw a line from the top of one column to the top of the other. The steeper the line, the more significant the home-field advantage is. If the line is level – say, 42% to 46% – then the factor you’re analysing (home-field advantage, remember?) doesn’t have significant value as a predictor of a game’s outcome.
If the line is steep, home-field advantage plays a more significant role. The steeper the line, the more importance you should assign this factor when predicting whether your team will win.
Maybe you’re curious about whether the date has significance. You can easily test whether odd or even dates have statistical significance. In column one, list the percentage of games won on odd dates – June 1, June 3, and so on. In the second column, the winning percentage on even dates: the second, the fourth, and so on. It will be a surprise if the line isn’t approximately level.
These are very simple examples. In the real world, you’re probably going to assess the importance of factors that require more columns.
For example, you might want to determine whether your team has a better record in high-scoring games or low-scoring games. You can do the same calculation for the opponent your team will face this week. Use different columns for each score.
Multiple regression analysis lets you combine these figures with related ones. For example, you can determine whether your team is more likely to play a high-scoring game at home or on the road. Do the same for the other team. These successive analyses let you use history in a sophisticated way to make a prediction about the upcoming game.
Regression analysis is what gives Tony Bloom his edge. Bloom analyses more factors than bookies do. Starlizard’s forecasts are usually so close to what bookies predict that there’s no point making a bet. But sometimes the gap is big enough to justify a bet. Make enough of those bets and you’ll be a billionaire like Tony Bloom.
The maths for statistical betting is simple but the notation looks complicated. When there are hundreds of data points instead of just the numbers at the top of the two columns, you need to use an algebraic technique called linear least squares to plot the line that best fits the data. The steepness of that line, remember, tells you how significant the factors are. Least squares analysis is conceptually simple and easy for a computer to perform, but it’s time-consuming and tedious to do it manually.
Luckily, a quick trip to Google or Bing will yield a bucketload of free spreadsheet templates for calculating least squares – and for performing multiple-factor regression analysis. Load the templates into Excel or Google Sheets, add historical statistics, and you’re all ready to set higher odds or lower odds based on your own calculations.
A bit of jargon: The “dependent variable” is the outcome – winning or losing the game. But this technique can also be applied to predicting the spread, the number of corners, the over/under, and other bets. The factors whose significance you are testing are the “independent variables”. If the dependent variable can have multiple values – like the number of corners per game – you analyse it with “linear regression”. When it can have just two, like win or lose, you use “logistic regression”.
If you want to get the most accurate results, you have to analyse as many variables as possible. This includes not just team data, but individual player data too. Collecting as much football data as possible is key.
Correlation and Causation
If you’ve ever engaged in an argument on an internet forum, you have at some point encountered someone who countered a seemingly valid argument by declaring that correlation does not prove causation.
That is a true point. A profound one. And for our purposes, it is irrelevant. If our team performs better when the temperature is low, then we can use that data in making a prediction.
Maybe the players perform better in the cold, maybe they’re more rested because they drive to games instead of taking the bus, maybe they’re in a better mood because they’re wearing jumpers their mums knit for them. We don’t care about causation. In statistical betting, correlation is enough.
Bayesian Statistics
The regression analysis we have explored so far is a powerful tool for making predictions. It can certainly help you make better bets. But this technique is well known to the mathematicians who calculate the odds you find posted at your bookmaker’s shop. To outsmart the oddsmakers and maximise your winnings, you’ve got to take the next step by using the results of your regression analysis as data.
You can think of the output of your regression analysis as a probability distribution: the likelihood of each possible outcome. You can use the figures to create models that represent the range of probabilities. Bayesian networks are based on a probabilistic graphical model that presents a set of variables with their conditional dependencies.
Let’s say you’re trying to predict a football score. For level one, you want to determine the strength of the team using data like league wins, home wins, and average goals per game. For level two, you factor in the injuries for each team. You can find that data posted on most football betting sites. Once you have populated both levels with data, you can make a better prediction about the final score.
Bayesian statistical analysis is complex both conceptually and mathematically. You can download free spreadsheet templates to help you set up a Bayesian model, or you can purchase a commercial statistical-analysis package and add the data you wish to analyse.
Poisson Distribution
Poisson distribution is a mathematical concept that gives you a chance to translate averages into a probability for variable outcomes.
Let’s say that your team scores an average of 2.1 goals per game. Poisson distribution analysis lets you predict the number of goals the team will score in the next game for the purposes of statistics betting.
The technique takes advantage of the fact that the team can average 2.1 goals per game by getting no goals in 19 games and 42 goals in the twentieth or it can get four goals in two games, three goals in three games, two goals in 10 games, and one goal in five games. In this case, the distribution tells you much more about what is likely to happen than the average does.