Question

Trying to replace values in my data frame which are listed as 'nan' (note, not 'NaN')

I've read in an excel file, then tried to replace the nan values like this:

All_items_df = ALL_df[df_items].fillna(' ')

Finally I get an output that still contains 'nan'

All_items_df ['Colour'].head(10)
Out[]: 
7     nan
8     nan
9     nan
10    nan
13    nan
14    nan
15    nan
16    nan
18    nan
19    nan
Name: Colour, dtype: object

Checking the nan values using isna() or isnull().value.all() gives me False for the above values. Why is it not recognising as nan/na values?

All_items_df ['Colour'].isnull().head(10)
Out[123]: 
7     False
8     False
9     False
10    False
13    False
14    False
15    False
16    False
18    False
19    False
Name: Minor Feats, dtype: bool

I'm then writing to a csv file and getting the 'nan' written to the file, even when specifying not to write out nan

All_items_df.to_csv(folderpath + "All_items.csv",encoding="UTF-8", index=False, na_rep='')

Answer 1

Your nan appear to be strings, and not actually null values. You can use this code to replace nan to actual null values before proceeding with whatever calculations you are planning on doing:

import numpy as np
df.Colour.replace('nan', np.nan, inplace=True)

Example:

>>> df
  Colour
0    nan
1    nan
2    nan
3   Blue
4    nan

df.Colour.replace('nan', np.nan, inplace=True)
df.fillna('', inplace=True)

>>> df
  Colour
0       
1       
2       
3   Blue
4

Answer 2

Make sure you read your nan values as NaN. You can do this via a parameter in pd.read_excel:

df = pd.read_excel('file.xlsx', na_values=['nan'])

Strangely, by default nan is not considered a NaN value in pd.read_excel:

na_values : scalar, str, list-like, or dict, default None

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’,

Pandas Dataframe nan values not replacing

2 个答案: