The Effect of Distance on Michigan Football Players

Submitted by MGoCali on December 18th, 2018 at 1:42 AM

First diary, here goes. 

Given the surprising attrition we've faced in the past few days, I decided to look into how distance affects the outcome of our players over the years. Frozemangos made a board topic and looked into Harbaugh's recruits over the past couple seasons. Seth suggested using his data base down in the thread, so I did.

I downloaded the data and cut out players who never made it here for grade reasons (think Demar Dorsey). Then I made some assumptions. Seth's database is coarse by location; i.e., it only gives the recruit's home state, and I was too lazy to look up specifically where within each player's home state he came from, so I assumed they came from the state capital. This was the easiest since I found a table of GPS coordinates for each state capital. 

Then I used a python package called geopy to calculate the distance between any two latitude, longitude locations on Earth. It's actually a step better than using spherical trig: it includes the asphericity of the Earth, so it isn't assuming the two points lie along a great circle on a sphere. For those of you who aren't familiar with this language, a great circle is the shortest path between two points on a sphere. This is why plane trajectories don't look straight when projected onto a flat atlas (and why you go over Greenland when flying to Europe).

With the distances in hand, I munged the data a bit. I assigned a "1" to each player who finished playing at Michigan and never played at another college location: entered the draft early, played out their eligibility, got a firm handshake and retired, etc. I assigned a "2" to any player who left due to disciplinary reasons, and I assigned a "3" to any player who transferred before their eligibility was up. I then looked at the distances the players have traveled:


The furthest distance is Julius Welschof, and he's still on the team, so I cut him out to allow for more efficient binning of the data. I worked with the following:


Using bins of 300 km mostly smooths out the assumption of assigning the state capital for each player since most players probably live within 300 km of their state's capital. (I realize that I used Lansing for Michigan's location, and I feel bad about it, but I didn't feel bad enough to fix it.)

Within each of these bins, I computed the fraction of "1s", "2s", and "3s" for the data, and I also generated 1000 bootstrap samples (this is a statistical technique for estimating the uncertainty). You'll notice the uncertainty is larger where the bins are have fewer, as expected. 

First, here is the breakdown of players in each group as a function of distance.


And here are the fractions of each group as a function of distance.


I used blue to represent the players who finished their college career here; I made the transfers red, and the disciplined green (because obviously). The data do not support a strong trend that players who come from further away transfer more. A more national dataset would be useful to make a more conclusive determination. There is insufficient data, especially above 2000 km.

In conclusion, I am better at working when the objective has nothing to do with what I am paid to do. 

I for one, am not that concerned about the transfers. I think everyone should calm down. These are young people, and a lot of money is involved if they can hack these four years together properly. 

Thanks to Seth for compiling an amazing dataset to work with. This was fun. 

Go blue!



December 18th, 2018 at 8:42 AM ^

Sample size caveat applies (like you said), but this is a cool breakdown and well explained. I enjoyed the fact that opponents=red and discipline=green.


December 18th, 2018 at 8:47 AM ^

Acknowleding the small sample size, this data does show one thing we all think to be true:

Assuming that 3,000 miles is southern California, there is a dearth of talent in the Rocky Mountain region (2100-2800).

Wisconsin Wolverine

December 18th, 2018 at 11:30 AM ^

I love it!  I do these types of things for fun once in a while, it's a blast to learn new methods and explore neat questions.  I use R exclusively because I never learned any other language, but python doesn't seem so different at first glance?

Arb lover

December 18th, 2018 at 3:16 PM ^

This is really interesting, thanks. 

Any chance you'd be willing to look at change in climate, i.e. average fall temperature difference from home to Michigan? I see the drop down in finish at averages of 750 miles and 1800 miles and wonder if those two cohorts are from Fla/Ga/LA and NM/AZ/Cal


December 18th, 2018 at 4:44 PM ^

Great stuff! I have been having similar thoughts for a while given the recent recruiting news. 

Two thoughts:

1. The surprising non-impact of distance on transfers could well be due to the fact that only kids who feel pretty comfortable traveling long distances to school wind up coming to Michigan from far away in the first place. In other words, the people who make the leap are, by definition, those who were the least likely to be affected by distance in the first place.

2. Relatedly, the more important recruiting issue - that I don't think anyone doubts much - is that it's hard to convince many people to go to school far away from home. The fact that the majority of Michigan's players come from reasonably nearby (as for all schools) seems to cement the importance of distance.

I think a cool extension of your project would be to look at head to head recruiting battles and see how often Michigan wins/loses to other teams based on how much closer/further M is to the kid's school than the other schools. This would be cool data.

Keep up the great work!


December 18th, 2018 at 9:21 PM ^

Bo had to go to the airport to intercept Anthony Carter's exit from Michigan back to Florida.  It seems most teams lose a player to homesickness here and there but not enough to stop recruiting the best talent from any distance.

Steve in PA

December 19th, 2018 at 10:54 AM ^

Interesting.  I had done something similar many years ago when TomVH was still here in regards to recruiting.  I had built a model to predict signing with distance being one of the factors.

What I found was that generic 5* travel more than generic 4*.  I'm sure there are a host of factors in that, but I assumed it was because 5* are recruited nationally and 4* are more recruited regionally.

I don't know python yet but it is on my to-do list.  Maybe I will pick that project up again when I do learn it.

Sextus Empiricus

December 20th, 2018 at 2:29 PM ^

Once upon a time, we played Alabama.  I was excited because I remembered a certain previous Orange Bowl.

18 odd years back my daughter came into this world.  I was excited because I didn't know any better.

I diarized a similar analysis for that game...

I tried to get my daughter to apply to Mich.  (No force was applied... I am not that kind of Blue.)

Both times I was impressed by the power of proximity.  Both times I was given a rude awakening.

Diaries should be worth 1000 points.  I appreciate this and the time you took.  

Always ... Go Blue!