Data Cleaning and EDA with Your ESPN Fantasy Football League

7 min readOct 1, 2019

I love/hate fantasy football.

I love watching Red Zone on Sunday and watching my points go up. I love all the smack talk with my friends. I love all the rules and debating every single setting in my league. And obviously, I love analyzing the data.

There are only 2 things I hate about fantasy football: the amount of time I need to devote to it and most importantly, how I never win. After having my best regular season and ultimately losing in the playoffs, I decided to take a year off in 2017. I’ve since come back to playing, but now I’ve taken a more backseat and data-driven approach to fantasy football.

In this article, I’ll go over the steps I took to import my ESPN fantasy football league into Jupyter Notebook. While browsing reddit, I came across a GitHub link that uses the very sparsely documented ESPN API to use with python (Huge props to cwendt94). Warning: I used my private league for this tutorial, along with my friends’ questionable team names. If your league is anything like mine, this should be expected. I just wanted to warn those who are easily offended.

Getting Started

The only thing you’ll need for this tutorial is an active ESPN fantasy football account. You can either pip install ff_espn_api or clone the repository from this link.

Once you have ff_espn_api, you’ll need to gather some info from your site. Taken from the repository:

For private league you will need to get your swid and espn_s2.(Chrome Browser) You can find these two values after logging into your espn fantasy football account on espn’s website. Then right click anywhere on the website and click inspect option. From there click Application on the top bar. On the left under Storage section click Cookies then http://fantasy.espn.com. From there you should be able to find your swid and espn_s2 variables and values!

Importing Your League

Your starting code should look something like this:

from ff_espn_api import League
league_id = 1234 #your league_id
year = 2018
swid = '{03JFJHW-FWFWF-044G}' #your swid
espn_s2 = 'ASCWDWheghjwwqfwjqhgjkjgegkje' #your espn_s2
league = League(league_id, year, espn_s2, swid)

Once you’ve imported your league, you can start playing around and extracting data from the league object. For example, we can return a list of the league standings by using league.standings() . We can return a list of every pick of the draft by using league.draft . But for our purposes, we’re going to be using league.teams .

league.teams contains all the info from every team in your league.

Our next step is converting your whole league into a dataframe. Each row in our dataframe will be a particular week for each person’s team. Before we can begin, we need to drop the ‘roster` keys and values. If we don’t, each member would be part of a week, and that doesn’t make any sense. If you want to analyze rosters and draft picks for each team, you’ll have to make a new dataframe. We can do this creating a list of column names from the league dictionary, and then dropping ‘roster’ from the list.

# get a list of columns for our dataframe
df_columns = list(league.teams[0].__dict__.keys())# remove roster from list of columns
df_columns.remove(‘roster’)

Converting to a DataFrame

Now let’s put everything together in a dataframe. We’ll run a for loop that will append the dictionary to a dataframe. Since I’m using my 2018 league, I’ll call my dataframe league_2018.

# append teams to dataframe
for d in range(len(league.teams)):
    team_df = pd.DataFrame(league.teams[d].__dict__, columns=df_columns)
    league_2018 = league_2018.append(team_df)

Your dataframe should look something like this.

Cleaning and Feature Engineering

Cool! Now we have a dataframe for your league. Let’s clean it up and do some feature engineering.

Check Your Team!

Before you start, play around with your dataframe! You can check your team by creating a mask/filter with your name.

Week Column

So right now, our index goes from 0–15 and then back down to 0–15 until the end of the dataframe. This corresponds to the index of the week for each team. Let’s make that its own column and reset our index so that it goes from 0–192 (16*12 = the number of weeks times the number of teams). We’ll also add 1 to every week since indices start at 0 (or is it indexes? idk but that’s unimportant).

# change the index to 'week'
league_2018.index.names = ['week']# add 'week' column
league_2018.reset_index(level='week', inplace=True)# add +1 to every week
league_2018['week'] = league_2018['week'].apply(lambda x: x + 1)

Draft Order Column (Optional)

This next step is optional and requires a bit of work. If you would like to add a draft order column, there are two ways to go about it. The first way would be to get the draft order from your league object and then manually map each team_id with the correct draft order.

You would print a tuple with the correct pairing of the draft order and team name. Then you would create a new column using df.map() using a dictionary.

The second way is to use this function:

def draft_order(league_year, league):
 # instantiate empty dictionaries
 team_dict = {}
 league_dict = {} # create dictionary of {team name: draft order number}
 for draft_number in range(1, len(league.teams)+1):
   team_name = str(league.draft[draft_number-1].team) # convert to string
   team_name = team_name[5:-1] #eliminates strip ‘team’ problem
   team_dict = {team_name: draft_number}
 league_dict.update(team_dict) league_year[‘draft_order’] = league_year[‘team_name’].map(league_dict) return league_year

This function will take in your league_2018 dataframe as well as your initial league object. Since league.team[0] corresponds to team_id : 1 and so on, we can use a for loop and assign each team_name to correct team. The function uses the league.draft method to get the entire draft, look at the first 12 indices (everyone’s first pick), assign each team_name to a draft number, then map the dictionary from league_year['team_name'] .

Even though it requires more work, I would suggest using the first option. It’s more explicit, but you can follow a bit easier what’s going on.

Playoff Column

If you are looking at a past season, you can also assign if a particular team made the playoffs or not. We’ll denote 1 as made the playoff and 0 as “you’re bad.” Depending on how your league does playoffs, you’ll have to change up the code. For our intents and purposes, we’ll just say the top 6 overall made the playoffs.

league_2018[‘playoffs’] = np.where(league_2018[‘final_standing’] <= 6, 1, 0)

Dropping Unessary Columns

Let’s also take a look at our columns with league_2018.columns .

Index(['week', 'team_id', 'team_abbrev', 'team_name', 'division_id', 'wins','losses', 'points_for', 'points_against', 'owner', 'streak_length','streak_type', 'standing', 'final_standing', 'logo_url', 'schedule','scores', 'outcomes', 'mov', 'year'],
      dtype='object')

We can go ahead and drop the columns that aren’t important to our data.

# drop unecessary columns
league_2018 = league_2018.drop([‘division_id’, ‘team_id’, ‘team_abbrev’, ‘team_name’, ‘streak_length’, ‘streak_type’, ‘logo_url’, ‘standing’, ‘outcomes’, ‘schedule’], axis=1)

We’re going to leave the owner column for now. You could correspond team_id to owner , but it’s easier, for now, to be able to do some EDA and see the team owner's name as an actual name instead of a number.

Exploratory Data Analysis

We can use matplotlib and seaborn to make a correlation matrix.

corr = league_2018.corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
 square=True, linewidths=.5, cbar_kws={“shrink”: .5});

Darker colors mean a stronger correlation.

Let’s look at final_standing . The darker the blue, the more correlation that feature would have with a lower final standing. In this case, it’s wins and points_for , which makes a lot of sense. The more you score, the more you win, the more likely you’ll have a lower final standing.

We can even single out the final_standings column.

league_2018.corr()[‘final_standing’][:].sort_values()

While the strong correlated features don’t surprise us (points_for, playoffs, wins and losses), we can look at the features that have little no correlation with final placement. In my league, draft order had no correlation with final placement, which is pretty amazing when you think about it. For as much importance people put with having a top draft pick, it seems as it doesn’t matter where you end up. Now there’s a myriad of other reasons why could be true: injured players, bad early picks, poor team management, bad luck with wins. But it is interesting to see that for this league, on this season, it had no effect.

What’s Next?

Now that you’ve done a single season, you can use these same steps to import past season or even this current season. In future blog posts, I’ll go over adding new features and also converting your jupyter notebook into a python script.