如何将值添加到缺失值中

时间:2019-06-26 10:07:05

标签: python-3.x pandas numpy

我有一个数据框,

    Software Product    Case Number Num of days
    MDM9607.LE.1.0          2774904     -19.13888889
    MDM9607.LE.1.0          2774203     -19.60069444
    MDM9607.LE.1.0          2768088       -24.81597222
    MDM9607.LE.1.0          2767500       -25.0125
    MDM9607.LE.1.0          2764617        -26.67916667
    MDM9607.LE.1.0          2766991      -25.17430556
    MDM9607.LE.1.0          2765696
    MDM9607.LE.1.0          2764204
    MDM9607.LE.1.0         2764199
    MDM9607.LE.1.0         2774434           365
    MDM9607.LE.1.0         2769029           377
    MDM9607.LE.1.0         2764195           380
    MDM9607.LE.1.0        2763721             25
    MDM9607.LE.1.0        2770456             380
    MDM9607.LE.1.0       2768423


要求的输出条件:

    If:
        f9['Num of days'] > 365 than print L
        f9['Num of days'] < 365 than print N
        f9['Num of days'] == NaN than print U

代码:

    import pandas as pd
    import numpy as np

    df1 = pd.read_excel(r"Rawreport_2017.xlsx")
    df2 = pd.read_excel(r"Sampleswpl.xlsx")
    f9 = pd.merge(df1, df2, on=['Software Product'], how='outer')
    f9.to_excel(r"merge_new_1.xlsx")
    f9['Num of days'] = f9['Date/Time Opened'] - f9['CSDate']
    f9['Num of days_u']=f9['Num of days'].fillna('u')
    f9['status'] = np.where(f9['Num of days'] > 365, 'L', 'NL','u')
    f9.to_excel(r"merge_status_5.xlsx")

我正在使用包含一些缺失值的数据框,该缺失值应打印为“未知”,但是如果列大于365,则应将其打印为“ L”,这是我的逻辑。<365应该打印为“ N” ”,但该缺失值也被视为0(零),并打印为“ N”。

预期输出应为

     Software Product   Case Number     Num of days    Status
    MDM9607.LE.1.0          2774904     -19.13888889        N
    MDM9607.LE.1.0          2774203     -19.60069444        N
    MDM9607.LE.1.0          2768088       -24.81597222      N
    MDM9607.LE.1.0          2767500       -25.0125          N
    MDM9607.LE.1.0          2764617        -26.67916667     N
    MDM9607.LE.1.0          2766991      -25.17430556       N
    MDM9607.LE.1.0          2765696                         U
    MDM9607.LE.1.0          2764204                         U
    MDM9607.LE.1.0         2764199                          U
    MDM9607.LE.1.0         2774434           365            L
    MDM9607.LE.1.0         2769029           377            L
    MDM9607.LE.1.0         2764195           380            L
    MDM9607.LE.1.0        2763721             25            N
    MDM9607.LE.1.0        2770456             380           L

我使用了上面的方法,但是得到了:

TypeError: where() takes at most 3 arguments (4 given)

1 个答案:

答案 0 :(得分:0)

使用numpy.select-首先测试缺失值Series.isna,然后根据条件,最后测试参数default

m1 = f9['Num of days'].isna()
m2 = f9['Num of days'] > 365

f9['Status'] = np.select([m1, m2], ['U','L'], default='N')

print (f9)
   Software Product  Case Number  Num of days Status
0    MDM9607.LE.1.0      2774904   -19.138889      N
1    MDM9607.LE.1.0      2774203   -19.600694      N
2    MDM9607.LE.1.0      2768088   -24.815972      N
3    MDM9607.LE.1.0      2767500   -25.012500      N
4    MDM9607.LE.1.0      2764617   -26.679167      N
5    MDM9607.LE.1.0      2766991   -25.174306      N
6    MDM9607.LE.1.0      2765696          NaN      U
7    MDM9607.LE.1.0      2764204          NaN      U
8    MDM9607.LE.1.0      2764199          NaN      U
9    MDM9607.LE.1.0      2774434   365.000000      N
10   MDM9607.LE.1.0      2769029   377.000000      L
11   MDM9607.LE.1.0      2764195   380.000000      L
12   MDM9607.LE.1.0      2763721    25.000000      N
13   MDM9607.LE.1.0      2770456   380.000000      L
14   MDM9607.LE.1.0      2768423          NaN      U