Useful Pandas Snippets

less than 1 minute read

Imputing missing values

From the mean of a feature

Country Name	Year	GDP
Aruba	1965	null
Aruba	1966	5.872478e+08

Say you have a dataframe for GDP by Country Name for each year, but some years are missing values. One way to deal with the missing values is to fill them in with the mean GDP for that country as follows:

df['GDP_filled'] = df.groupby('Country Name')['GDP'].transform(lambda x: x.fillna(x.mean()))

With forward fill

We can also use the ffill option from Pandas.

First we need to take care to sort the data by year, then we group by the Country Name so that the forward fill stays within each country

df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='ffill')

With backward fill

Of course there is backward fill too:

df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='bfill')

Whatsapp Twitter Facebook LinkedIn

Lee H

Useful Pandas Snippets

Imputing missing values

From the mean of a feature

With forward fill

With backward fill

Comments

You May Also Enjoy

Logistic Regression from scratch

A/B Tests and Experiment Size

NLP Pipelines with NLTK

Deploying imgproxy with AWS Fargate