Useful Pandas Snippets
Imputing missing values
From the mean of a feature
| Country Name | Year | GDP |
|---|---|---|
| Aruba | 1965 | null |
| Aruba | 1966 | 5.872478e+08 |
Say you have a dataframe for GDP by Country Name for each year, but some years are missing values. One way to deal with the missing values is to fill them in with the mean GDP for that country as follows:
df['GDP_filled'] = df.groupby('Country Name')['GDP'].transform(lambda x: x.fillna(x.mean()))
With forward fill
We can also use the ffill option from Pandas.
First we need to take care to sort the data by year, then we group by the Country Name so that the forward fill stays within each country
df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='ffill')
With backward fill
Of course there is backward fill too:
df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='bfill')
Comments