Question

我有一个看起来像这样的数据集：

index  Ind.  Code Code_2
    1     1   NaN      x
    2     0     7    NaN
    3     1     9      z
    4     1   NaN      a
    5     0    11    NaN
    6     1     4    NaN

我还创建了一个列表，以指示“代码”列中的值，如下所示：

Code_List=['7', '9', '11']

只要Ind。= 1，我想为指标创建一个新列，即代码在上面的列表中，而代码2不为空

我想创建一个包含if语句的函数。我尝试了此操作，但不确定是否是语法问题，但是我不断收到属性错误，例如：

def New_Indicator(x):
    if x['Ind.'] == 1 and (x['Code'].isin[Code_List]) or (x['Code_2'].notnull()):
        return 1
    else: 
        return 0

df['NewIndColumn'] = df.apply(lambda x: New_Indicator(x), axis=1)

（“'str'对象没有属性'isin'”，'出现在索引259'）（“'float'对象没有属性'notnull'”，'发生在索引 259'）

Answer 1

问题在于，在您的函数中，x['Code']是一个字符串，而不是一个Series。我建议您使用numpy.where：

ind1 = df['Ind.'].eq(1)

codes = df.Code.isin(code_list)

code2NotNull = df.Code_2.notnull()

mask = ind1 & codes & code2NotNull

df['indicator'] = np.where(mask, 1, 0)

print(df)

输出

   index  Ind.  Code Code_2  indicator
0      1     1   NaN      x          0
1      2     0   7.0    NaN          0
2      3     1   9.0      z          1
3      4     1   NaN      a          0
4      5     0  11.0    NaN          0
5      6     1   4.0    NaN          0

更新（由@ splash58建议）：

df['indicator'] = mask.astype(int)

如果函数：如果列A == 1并且1列B在列表X中并且列C不为null，则为1。否则为0

1 个答案: