Trying to replace values in my data frame which are listed as 'nan' (note, not 'NaN')
I've read in an excel file, then tried to replace the nan values like this:
All_items_df = ALL_df[df_items].fillna(' ')
Finally I get an output that still contains 'nan'
All_items_df ['Colour'].head(10)
Out[]:
7 nan
8 nan
9 nan
10 nan
13 nan
14 nan
15 nan
16 nan
18 nan
19 nan
Name: Colour, dtype: object
Checking the nan values using isna() or isnull().value.all() gives me False for the above values. Why is it not recognising as nan/na values?
All_items_df ['Colour'].isnull().head(10)
Out[123]:
7 False
8 False
9 False
10 False
13 False
14 False
15 False
16 False
18 False
19 False
Name: Minor Feats, dtype: bool
I'm then writing to a csv file and getting the 'nan' written to the file, even when specifying not to write out nan
All_items_df.to_csv(folderpath + "All_items.csv",encoding="UTF-8", index=False, na_rep='')
答案 0 :(得分:1)
Your nan
appear to be strings, and not actually null values. You can use this code to replace nan
to actual null values before proceeding with whatever calculations you are planning on doing:
import numpy as np
df.Colour.replace('nan', np.nan, inplace=True)
Example:
>>> df
Colour
0 nan
1 nan
2 nan
3 Blue
4 nan
df.Colour.replace('nan', np.nan, inplace=True)
df.fillna('', inplace=True)
>>> df
Colour
0
1
2
3 Blue
4
答案 1 :(得分:1)
Make sure you read your nan
values as NaN
. You can do this via a parameter in pd.read_excel
:
df = pd.read_excel('file.xlsx', na_values=['nan'])
Strangely, by default nan
is not considered a NaN
value in pd.read_excel
:
na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’,