我在Pandas github repo中评论了一个已解决的问题:
在Excel中以nan
处理空值还有另一个副作用:整数将转换为float。随后对该列进行的操作将再次产生其他效果。
read_excel()
也不接受转换器中函数提供的空值处理:
我有一个包含以下数据的Excel文件temp.xlsx
:
Key3
列中的值周围有空格。
Key1,Key2,Key3,Key4
0,11, Apple ,1.12
1,12,,1.02
2,13, Orange,
3, ,Banana ,0.01
这是代码:
import numpy as np
import pandas as pd
def handle_string(value):
return value.replace(' ', '')
def handle_integer(value):
if value == '':
return 0
else:
int(value)
def handle_float(value):
if value == '':
return 0.0
else:
float(value)
df = pd.read_excel(
'temp.xlsx',
)
print(df)
print(f"type(df.loc[3,'Key2']) = {type(df.loc[3,'Key2'])}")
print(f"type(df.loc[1,'Key3']) = {type(df.loc[1,'Key3'])}")
print(f"type(df.loc[2,'Key4']) = {type(df.loc[2,'Key4'])}")
print('')
df = pd.read_excel(
'temp.xlsx',
converters={\
'Key1' : handle_integer,
'Key2' : handle_integer,
'Key3' : handle_string,
'Key4' : handle_float,
}
)
print(df)
print(f"type(df.loc[3,'Key2']) = {type(df.loc[3,'Key2'])}")
print(f"type(df.loc[1,'Key3']) = {type(df.loc[1,'Key3'])}")
print(f"type(df.loc[2,'Key4']) = {type(df.loc[2,'Key4'])}")
输出:
Key1 Key2 Key3 Key4
0 0 11.0 Apple 1.12
1 1 12.0 NaN 1.02
2 2 13.0 Orange NaN
3 3 NaN Banana 0.01
type(df.loc[3,'Key2']) = <class 'numpy.float64'>
type(df.loc[1,'Key3']) = <class 'float'>
type(df.loc[2,'Key4']) = <class 'numpy.float64'>
Key1 Key2 Key3 Key4
0 None NaN Apple NaN
1 None NaN NaN NaN
2 None NaN Orange 0.0
3 None 0.0 Banana NaN
type(df.loc[3,'Key2']) = <class 'numpy.float64'>
type(df.loc[1,'Key3']) = <class 'float'>
type(df.loc[2,'Key4']) = <class 'numpy.float64'>
dtype
参数的优先级低于converters
。
答案 0 :(得分:1)
我可能是错的,但是在我看来问题似乎与这些函数的返回值有关。在两个地方,您显然没有打算返回None
。见下文:
def handle_string(value):
return value.replace(' ', '')
def handle_integer(value):
if value == '':
return 0
else:
int(value) # Returns none!!!
def handle_float(value):
if value == '':
return 0.0
else:
float(value) # Returns none!!!