Wednesday, May 15, 2013

Comparison of AFL Power Rankings Systems

When I created my AFL Power Rankings in 2011, I did not know of any other similar ranking systems for AFL. Basically my purpose was to create a system which gave a better indication of the relative strength of each team than the AFL ladder did. Ladder positions, particularly early in the season, can be to a large extent determined by which other teams each team has played, while they can also cover over the strength of recent form for each team. My system was never really intended to predict future results – though it could well be used for that purpose – but to give what I thought was a better assessment of past results than the ladder did.

My rankings system, as I assume is the case with most ranking systems, is not at all intended to indicate that the ladder is meaningless. There can be no denying that a team would rather win the premiership than be #1 on some rankings system. What ranking systems are meant to do is to give a better indication of the ‘actual’ strength of each team. If team A has a 65 per cent chance of winning a match, and team B has a 35 per cent chance of winning, then team A would be considered the stronger team. But that does not mean team A will win – by definition, team B has a non-zero chance of winning the match. There are many, many cases during the season where the team assessed as being weaker will win, including in the Grand Final. In constructing the rankings system I intended to look beyond the evidence from any particular match, and take account of the evidence not only from that match, but from other matches over a period of time, to get a better assessment of how strong a particular team has been.     

My rankings system depends only on these factors: the final margin for each match, where each match was played, the strength of the teams in each match, and how recent each match was. My reasons for choosing these factors were outlined here, but essentially I chose them because they seemed to me the main factors that football watchers use when adjusting the worth of each result. But while people often mentally adjust for these factors, they would rarely (including me) have an ‘objective’, quantifiable means of doing so. Thus my ranking system was really a way of adding some ‘objective’ rules to the subjective judgments that fans such as myself make.

A good example of where people make adjustments to the ladder is the premiership betting. A team might start the season 3-0 and be first or second on the ladder, but if they had a mediocre season last year then they might not be too far from mid-range in the premiership betting. Indeed, you could argue that the premiership betting might be the best ranking system of them all, because it reflects the collective assessments of many football followers, including those with possibly more accurate models than mine. One thing that convinced me that my ranking system might be OK was that it gave results that were not too far away from the premiership betting market. My hunch though is that there are enough people in the betting market who are shifted by emotion for it to react too quickly to shifts in form, and that there are systems out there which can beat the market, even if only by a little.    

Since then I have found out about other ranking systems, including those used at AFL Footy Maths, and just recently (though it is an old system) at Footy Forecaster. I can’t find the formula for Footy Forecaster’s rankings but it does not look like it would be all that different from mine given how close the ranking points for each team are under each system. If I had found this system before I devised my rankings I might never have bothered to create my own. Indeed, it might be that the Footy Forecaster rankings would be close to what I got if I fixed up the logical flaws in my system that always slightly bothered me. For example, the sum of the ranking points across teams in my system does not add up to zero, whereas they do in the Footy Forecaster system. However, the main difference in the systems to me appears to be the adjustment for home ground advantage, which looks to be considerably less for Footy Forecaster.       

For the AFL Footy Maths system, my understanding is that the main factors determining each team’s rankings are the same as my own. Again if I found it before creating my own I may not have bothered with mine. One main difference I have noticed is that teams tend to move slightly more quickly around the ranking positions in my system – I don’t know if this is a good or a bad thing, but in any case the rates of movement are not that different since the Footy Maths system underwent its renewal. I think another main difference (though I am prepared to stand corrected) is that in my system each match that is played changes the worth of previous results. For example, in my system, if team A beats team B by 60 points and Team B’s average net margin is -20 points then this is a very good result for team A. If team B then gets beaten by 60 points again the next week it is still a very good result for team A but less so. And if team B keeps getting beaten by 60 points every week then team A’s margin of victory eventually becomes considered par for the course.

And then there’s Roby…

Roby’s rankings appear on the Big Footy forum, where, rightly or wrongly, they are routinely subjected to some pretty hefty criticism. One thing I will say is that if Roby does actually calculate all the factors he says he does then maintaining his system must mean a hell of a lot of work. By contrast, my own system takes about 10 minutes to update. Another thing is that Roby does use his system for betting. Based on his description (and again I might be wrong), only backward-looking information determines his rankings, and not forward-looking information. For example, one might try and forecast expected future performance based not only on past performance, but also forward-looking factors such as future age profile, and … well, I can’t think of anything else at the moment. The result of using only backward-looking information is that, unless things stay the same forever and ever, there will be inevitably be errors in his predictions (same with my system). If you used forward-looking information as well which accurately predicted how a team’s form would develop into the future, you could reduce these errors. But anyway it could still be the case that Roby’s prediction errors are lower than anybody else’s.

Roby claims the intention of his rankings is to get a better understanding of how close each team is to winning a premiership. Presumably then this means that the team ranked #1 is considered the most likely to win a premiership. This, of course, does not mean that team will certainly win the premiership, or even that it is likely to. The current premiership favourite on the betting market, Hawthorn, typically has odds of around $3.25 to win the premiership, which means that it is generally considered far more likely that it will not win the premiership this year than it will.

Having said that, Roby’s phrasing is a bit unfortunate, because there will come a point in the season where multiple teams have 0 per cent chance of winning the premiership (16 will have no chance by Grand Final day). But unless I am missing something fundamental, I think you can also say that his rankings are just meant to be an indication of how well each team would be expected to perform relative to other teams in a game on neutral turf, with no injuries, with the same number of days break, etc., so I won’t be overly pedantic on that point.   

Roby’s rankings collate and model data on: final margins; score differentials over the course of the game; the team’s expected performance based on team selection, form, home ground advantage, breaks, travel, historical matchups; and previous/current ranking position(s), in-game injuries, umpiring decisions, and weather conditions.  As I said previously, my own rankings and the rankings over at AFL Footy Maths are based on final margins, home ground advantage, and strength of opponent. If the other factors Roby includes are found to have a significant impact on performance, and if he can assess those factors accurately, and he is at least as accurate as me on the factors that I include, then his ranking system will be more accurate. But how significant are those factors likely to be and how accurately can they be assessed? I don’t have the evidence – Roby may or may not but I don’t think he’s shared it – so the following represents my best guesses.

Score differentials over the course of the game: Factoring this in is saying that not only the final margin matters but also how you get there – being up by 80 points at three quarter time and then winning by 40 points is more dominant than being even at three quarter time and then winning by 40. A fair proportion of the football following public would agree with this. I don’t know if, empirically, it helps in terms of predicting future performance, but it doesn’t seem unreasonable that it could, and you can quantify it.  

Team selection: I’d say this could be similar to the ‘Full Strength Indicator’ that Champion Data produces.  One has to make some judgment calls as to what a team’s best line-up is, but if you have an accurate method of rating players this could be useful.

Breaks and travel: Well, coaches often seem to think these things matter. Again, I don’t know if, empirically, they are right, but it’s not an unreasonable possibility, and these things are easily quantified.

Historical matchups:  I suppose this means that, after accounting for the current strength of each team, if a team does better against a particular team than it has done in the past then its performance is rated more highly. For example, if Hawthorn broke its losing streak against Geelong it would gain points beyond those from beating a team of Geelong’s calibre. That makes sense, though I don’t know if each team has enough ‘bogey’ teams for this to make much difference.

In-game injuries: Now how do you quantify this? If I was to do it, I would need to know the difference in ability between the player who was injured and the player who replaced them in that position. I guess I’d also need some way of quantifying the effect, if there is one, of other players dropping their performance because they are more tired. And in the case where the player stayed on the ground, I would need to know what the reduction in their capacity to perform was (or I could just ignore these cases). Not impossible to do, but not easy.

But I don’t think this factor would make a lot of difference. Because of the way injury news is phrased, for example, ‘So-and-so is OUT!’, it brings to mind a big gaping hole, but a player who is injured is replaced, and unless you lose a star the drop-off in quality will not be that sharp, and anyway that player is only 1/18 of the players for that team on the field. Losing three or four players in one game might well have a big impact on the outcome of that game, but that wouldn’t occur to a particular team too often. Out-of-game injuries would be more important, but that is presumably covered under team selection.

Umpiring decisions: I guess that umpiring decisions could have a significant impact on the outcome of a game, maybe less so over a season as I’d expect errors to even out. But quantifying the impact is a very difficult exercise. You basically have to work out what is the expected scoring impact of each decision, based on where the ball is on the ground, the type of decision, and so on.

Besides this, any judgment about umpiring decisions will be subjective. Now I have no reason to believe that Roby is a bad judge of umpiring decisions, and it sounds like he does carefully review them (that also sounds like a hell of a lot of work). But try explaining this to other football followers, most of which will swear their team was crucified by the umpires last weekend. (I swear that Richmond has been crucified by the umpires for the past thirty years.) Adding in the effects of umpiring decisions could make the model more accurate, but it’s a tough sell.

In the end then, all of the factors that Roby has added make sense, but without knowing the empirical evidence for their inclusion, and how they affect the final rankings, it is hard to comment on their worth. Roby could help out here, and could certainly help the credibility of his rankings, by revealing this information. On the other hand, if I put in as much effort to generate rankings as Roby seems to, and those rankings actually did generate betting profits over the long run, I’d probably be reluctant to reveal that information as well.

Thanks to John Murray and AFL Footy Maths for the discussions which led to this post.


@AFLFootyMaths said...

Thanks for the commentary and link, Troy.
For us, we started rankings just for a bit of fun last year, without researching one iota if anyone else was doing the same.
After a few months, we came across yours and the others mentioned above (and not playing in the pools over at BigFooty, we had no idea about Roby's rankings until alerted to them).

Agree about Roby too. His system may be good, bad or indifferent, but he does humself a dis-service by hiding even the most basic detail (bar the ranking table).

Add in the 'subjectives' around who gets a rub of the green umpiring-wise, and it casts doubts on the validity of that system.

And from the little I have read about Roby's approach and reasoning, I only draw the conclusion that he is confused about the aims... betting? best team at any point? most likely to win a flag?

But if he is winning money, all well and good.

WayBeyondRedemption ( also has a crack at rankings... needs an update though!


@AFLFootyMaths said...

Forgot to mention... I started this after reading these cricket rankings over a year or so (, and the concept interested me.

Russ helped a lot (A HUGE AMOUNT) with the mathematics etc and I just filled in the rest.

Troy Wheatley said...

I think Roby's system, trying to read between the lines, is just trying to rank the teams in order of strength. But I guess his ultimate aim for doing so is to rate teams more accurately than the betting market.

Yours/Russ' system is more mathematically complex than mine, but the factors underlying it are easy enough to understand. One is never completely baffled as to why teams move up or down, which I think is a good quality for a system to have.

Russ said...

Troy, how do you backcast the results to reduce the importance of a win if the thrashed opposition continues to struggle.

My ratings work on as a state in time (which is more important for cricket than football). In the end, if a team drops the redistributed points tend to fan out, but not necessarily quickly. Thus two things that continue to plague me:

1) the team that wins first gets more points than the teams that follow. This is particularly true for the cricket ratings, because teams play long series, and I have a smoothing function that accelerates repeated changes in one direction.

2) For tiered/regonal systems, if teams A,B,C mostly play one another and teams D,E,F mostly play one another, when the cross-over games occur only the teams involved in the cross (AvD for eg.) have their rankings change. Again, the results fan out, but when teams don't play that often it can be a slow process (I'd rather the groups moved in concert, but the maths is beyond what I'd want to do in excel).

But if you had any thoughts I'd be interested.

Troy Wheatley said...

Hi Russ,
Sorry for the late reply - just saw your comment.

In my system, the ranking points assigned for a game depend on a relative net margin. This relative net margin is calculated based on the average net margin of the opponent over the past 22 games. So if Team A wins against Team B by 8 points and Team B's average net margin over the past 22 games is -7 points, then Team A's relative net margin for the game is 1 point. If Team B then gets thrashed by Team C by 100 points the next week, then its average net margin will drop below -7 points, and therefore Team A’s relative net margin for that game will drop also.

Like any method, it’s imperfect, and if you think about that system for adjusting for the opponent’s strength a bit you’ll discover some anomalies. I toyed with changing it for this year, but decided to stick with it for now.
I’d have to think a bit more about the problems that are plaguing you – I’d probably need to read over the methodology again before I have something sensible or helpful to say!

Troy Wheatley said...

Actually the best way to explain it is this:

In my system accounting for the strength of the opposition is not done by reference to the opposition's ranking. It's done with reference to the opposition's average net margin, which will be close to its ranking. It makes things easy to compute but raises the logical conundrum that an opposition's strength is assessed on a basis other than its ranking.

In Russ' system accounting for the strength of the opposition is based on the opposition's ranking, which is more logically consistent. The trade off is that trying to readjust past results is computationally harder, and figuring out how to do it is making my head hurt! But I'm sure there's some solution to it.