OT: (free) Sports APIs
Hey all,
I'm learning about how to access APIs and I want to be able to download a season's worth of data (preferably for UM football or basketball) for some project ideas i'm thinkinh about but saw that ESPN doesn't have any public APIs. How do others access sports data?
I'm using R, if that affects anything
Thanks!
March 21st, 2018 at 12:08 PM ^
So long as it's just for personal projects, web scraping could be a good option and is a fun skill to learn. sports-reference.com is probably a good source. I also found another link that could help:
https://opendata.stackexchange.com/questions/4662/is-there-an-api-or-gl…
March 21st, 2018 at 12:11 PM ^
Yeah, its just for personal projects and practice, really. I want to get raw game-level data to try and make predictions with some forecast modeling for fun (i'm a biostats masters student, so that is fun for me ha)
I enjoy all the work and content by Seth and others and want to try it out myself
March 21st, 2018 at 12:54 PM ^
Kaggle has a plethora of NCAA Basketball data on the March madness competition page. I believe they get the data from sports-reference.
I work in pharma and have always wondered what the difference between biostats and regular stats are. I'm looking for a deeper answer than the internet's, "biostats is the application of statistics to biology"
I'm sure someone here can help me with this.
but we aren't going to tell you. We have a second board where we have thousands of threads explaining this difference-- and just as many threads mocking you for not having more than a childs understanding of the two fields.
There is now a third board formed because most of us are simply disgusted by the willful persistance of your ignorance and, quite frankly, you will recieve nothing but contempt from any one of us.
especially janitors, who don't know how to spell "receive".
Elementary school grammar/spelling lesson: "i" before "e" except after "c".
From my experience of it (since i don't have a pure Statistics Masters), it is really that but it the course work is more applied so we don't do deep dives into statistical theory past functionality. For example, we've briefly looked over Bayesian analysis but only in so far as to have an understanding of how it creates its posterior probabilities and "updates" the current probability as data is entered. Does that make sense?
We do our work and projects primarily in SAS or R but do occaisonally dip into SPSS syntax, SQL, Tableua, etc (whic if anyone is hiring an anlalyst, I know how to program in all of those I listed!)
It's just a sub-field of statistics, just like econometrics, geostatistics, etc. It also sounds cooler, which is extremely important. For a fun russian nesting doll of statisitcal disciplines, I currently TA geostatistics in the biostatistics department.
I am a lawyer who has hired biostatisticians as consultants and expert witnesses. What I can tell you is that they are very, very smart. What do they do? A biostatistician once told me that they find the flaws in epidemiological studies. That sounds kind of flip, but it is what they have done for me.
March 21st, 2018 at 12:23 PM ^
Was going to say the same thing. I've scraped from there a bunch and it's pretty solid. You can do the same with ESPN game plays without too much trouble.
More generally, you can find a lot of useful APIs at https://www.programmableweb.com
They give rankings/details on several different offensive & defensive categories.
Link to national data:
http://stats.ncaa.org/rankings?academic_year=2018&division=11.0&sport_c…
Link to Michigan page - season or games stats:
Love that this thread exists here #MichiganDifference. I was scraping possession data for NCAA hoops games from espn.com a couple years ago but they made some changes to their website which broke my method and I haven't bothered fixing it.
I am very curious to learn if there is any free possession level data that is available, since my understanding (as of the last time I looked, which was a couple years ago) was that you had to pay for it.
March 22nd, 2018 at 12:34 AM ^
this guy has pbp data for college bball - his scrapers are in the repo too
ESPN does have an undocumented API that you can access. I've created an npm package to access it (npm, github). It's JavaScript, but should be easy to reverse engineer for use in R. This is just for college football, but they do have endpoints for all other sports, including college basketball.
I've also built up a rather large relational database in PostgreSQL if you have any interest in that. There's quite a range and quantity of data included, so I won't list off everything but it does have game, drive, and play data going back to 2001. Check it out on github here. Personally, I've been using all this data to build up some neural networks and have had a reasonable level of success. I plan to build up a public API over the database at some point, just trying to find the time.
Lastly, I highly recomment r/CFBAnalysis if you are on reddit. It's a great resource for sharing data and discussing ideas for analyzing statistics.
Thank you for sharing this! Take my +1
No prob, man. Since you mentioned looking for possession-level basketball data, here's an example of using ESPN's API to grab play by play for a basketball game. I haven't gone too deeply into the basketball stuff and don't know if that's what you'd be looking for. You can get the "event" parameter by going to a game's boxscore and grabbing the "gameId" from the URL.
here is an nfl scraper for R:
https://github.com/maksimhorowitz/nflscrapR
also if you want to store it in a relational database, this project is unmaintained but will give you a starting point:
https://github.com/BurntSushi/nfldb
March 22nd, 2018 at 12:06 AM ^
this is some good dimp. I have no idea what is happening. I think these people are russian spies!
March 22nd, 2018 at 12:10 AM ^
Hey guys, what’s — whoops, wrong thread. I’ll just, uh, quietly leave.
Thanks for the explanations guys. I'm looking for public sports APIs too and I didn't know ESPN has undocumented ones. I have a sports-related project which is already at the finish line. I am reading about load testing, about ways, its parts, and strategies here in order not to fail with its launch. The information you provided on the thread I'll try to implement for improving my current and future projects. Thanks.