Sunday, September 19, 2010

Super Secret Formula

As I have previously discussed, I, like many people, play fantasy football. For the past few years, I have been in two leagues. As I think is common in fantasy leagues, both of my leagues release a weekly power ranking. Ever since I joined a 10-team league with some guys at work, the power rankings have been sent out each week as a large Excel spreadsheet. The spreadsheet tracks various pieces of data. First is the points scored by every team with the high and low scores from each week highlighted. Next, there is the number of other teams each team would have beaten for every week. Finally, there are the wins and losses for the season. The data in each category is cumulative as the season goes along. The ten teams are ranked in each of the three categories. The best team in each category gets 10 points and the worst gets 1 point. The teams are ranked based on the sum of the three rankings. Therefore, the best possible score would be 30 points and the worst would be three.

My other league has twelve teams. One of the other owners is Alan, a college buddy mine. His brother Matt is the league manager. The other owners are various friends of theirs. When the league first got started, Matt would post weekly power rankings based on his own personal opinions. Team StandingsWhile that was interesting and generated fun posts on the league's message board, I felt that I could come up with rankings that more accurately reflected the strength of the teams. I knew I would be using a spreadsheet, but I did not want to maintain data from previous weeks as is done on my other league. Therefore, I wanted to only use data available for the current week.

Both of my leagues are on ESPN.com. The standings page on ESPN.com includes two tables. The top table has wins (W), losses (L), ties (T), winning percentage (PCT), and games behind the first place team in the standings (GB). The second table has points scored for (PF), points scored against (PA), home record, away record, division record, and streak of either winning, losing or tying. (I have seen a tying streak of two). In my fantasy leagues, the home and away records are meaningless. My 12-team league only has one large division, so in that league, the division record is also meaningless.

I have been constantly evolving my ranking formula since I first began. I started by simply copying and pasting the standings data into a spreadsheet and then playing around with it. Normalized ValuesI eventually settled on using four of the available data categories. To the right is example standings from last season showing the four components. I think PCT, PF and streak are pretty obvious. Teams with the best PCT and PF have good arguments for being the best team. The streak component answers the "What have you done lately?" question. Regardless of the other data, would you rather play a team on a long winning streak or a long losing streak?

Probably the only questionable component is PA which I include in the formula as an adder instead of a negative. In fantasy football, the players on your team have no affect on your opponent's performance. Therefore, it is my opinion that PA represents the strength of a team's schedule. I think including PA as one of the components helps teams that have scored a good number of points, but do not have great winning percentages or long winning streaks because they have played teams that have also scored a lot of points.

If you thought things were geeky already, it is now going to get even geekier. StreakFor each component, I normalize the values. Normalization is a mathematical technique were a set of values are mapped between one and zero with the highest value being set to one and the lowest value being set to zero. All the other values are somewhere in between. To normalize a set of numbers, each number is subtracted by the minimum value in the group and the result is divided by maximum value minus the minimum value. Here are the example standings again with columns added for the normalized values of PCT, PF and PA. I have highlighted the highest and lowest values for each component.

Normalizing the streak value is a bit more complicated since there can be winning streaks, losing streaks, and even tying streaks. Before doing the normalization, the formula converts winning streaks to positive numbers, losing streaks to negative numbers, and tying streaks of any length to zero. Here are the streaks for the example teams with the plus-or-minus values and the normalized values. Unfortunately, my snapshot from last year does not include any teams with a streak of ties, but if there were teams on a tie streak those values would be set to zero which would fall somewhere near the middle of the range after being normalized.

Of course, the four components are not all equal. RankingsI think winning percentage and points scored are more important factors and should have more influence in the formula. Streak should have less influence, and PA should have the least. The weight assigned to each of the components can be easily changed in the spreadsheet. I have the weights listed as percentages. The weights for the four components should add up to 100%. My spreadsheet does not strictly enforce that particular rule, but does make it obvious when the weights do not sum up properly. I chose to give PCT a weight of 30%, PF a weight of 40%, PA a weight of 10%, and streak a weight of 20%.

The component values for each team are multiplied by their selected weights and are then added together to give a score that should fall into the range between zero and one if the weights were chosen properly. The teams are then sorted by their scores to give the weekly rankings. Here are the scores from the example sorted from highest to lowest.

No comments:

Post a Comment