如何用None替换NaN和NaT-熊猫0.24.1

时间:2019-07-09 11:22:49

标签: python pandas

我需要将NaN中的所有NaTpandas.Series替换为None

我尝试过:

def replaceMissing(ser):
    return ser.where(pd.notna(ser), None)

但是它不起作用:

import pandas as pd

NaN = float('nan')
NaT = pd.NaT

floats1 = pd.Series((NaN, NaN, 2.71828, -2.71828))
floats2 = pd.Series((2.71828, -2.71828, 2.71828, -2.71828))
dates = pd.Series((NaT, NaT, pd.Timestamp("2019-07-09"), pd.Timestamp("2020-07-09")))


def replaceMissing(ser):
    return ser.where(pd.notna(ser), None)


print(pd.__version__)
print(80*"-")
print(replaceMissing(dates))
print(80*"-")
print(replaceMissing(floats1))
print(80*"-")
print(replaceMissing(floats2))

您可以看到NaT未被替换:

0.24.1
--------------------------------------------------------------------------------
0          NaT
1          NaT
2   2019-07-09
3   2020-07-09
dtype: datetime64[ns]
--------------------------------------------------------------------------------
0       None
1       None
2    2.71828
3   -2.71828
dtype: object
--------------------------------------------------------------------------------
0    2.71828
1   -2.71828
2    2.71828
3   -2.71828
dtype: float64

然后我尝试了这个额外的步骤:

def replaceMissing(ser):
    ser = ser.where(pd.notna(ser), None)
    return ser.replace({pd.NaT: None})

但是它仍然不起作用。由于某些原因,它会带回NaN

0.24.1
--------------------------------------------------------------------------------
0                   None
1                   None
2    2019-07-09 00:00:00
3    2020-07-09 00:00:00
dtype: object
--------------------------------------------------------------------------------
0        NaN
1        NaN
2    2.71828
3   -2.71828
dtype: float64
--------------------------------------------------------------------------------
0    2.71828
1   -2.71828
2    2.71828
3   -2.71828
dtype: float64

我还尝试将系列转换为object

def replaceMissing(ser):
    return ser.astype("object").where(pd.notna(ser), None)

但是现在即使没有缺失值,最后一个系列也是object

0.24.1
--------------------------------------------------------------------------------
0                   None
1                   None
2    2019-07-09 00:00:00
3    2020-07-09 00:00:00
dtype: object
--------------------------------------------------------------------------------
0       None
1       None
2    2.71828
3   -2.71828
dtype: object
--------------------------------------------------------------------------------
0    2.71828
1   -2.71828
2    2.71828
3   -2.71828
dtype: object

我希望保留float64。因此,我添加了infer_objects

def replaceMissing(ser):
    return ser.astype("object").where(pd.notna(ser), None).infer_objects()

但它会再次带回NaN

0.24.1
--------------------------------------------------------------------------------
0                   None
1                   None
2    2019-07-09 00:00:00
3    2020-07-09 00:00:00
dtype: object
--------------------------------------------------------------------------------
0        NaN
1        NaN
2    2.71828
3   -2.71828
dtype: float64
--------------------------------------------------------------------------------
0    2.71828
1   -2.71828
2    2.71828
3   -2.71828
dtype: float64

我觉得必须有一个简单的方法来做到这一点。有人知道吗?

1 个答案:

答案 0 :(得分:1)

对我来说,您的第二个解决方案的工作变更顺序已在 <EditText ... android:inputType="none"/> 中进行了测试,但是0.24.2变成了对象,因为混合类型-dtypeNonefloat s:

timestamp

def replaceMissing(ser):
    return ser.replace({pd.NaT: None}).where(pd.notna(ser), None)

print(pd.__version__)
print(80*"-")
print(replaceMissing(dates))
print(80*"-")
print(replaceMissing(dates).apply(type))
print(80*"-")
print(replaceMissing(floats1))
print(80*"-")
print(replaceMissing(floats1).apply(type))
print(80*"-")
print(replaceMissing(floats2))