Question

我有以下示例：

data = {'model': ['Lisa', 'Lisa 2', 'Macintosh 128K', 'Macintosh 512K'],
        'launched': [1983,1984,1984,1984],
        'discontinued': [1986, 1985, 1984, 1986]}

df = pd.DataFrame(data, columns = ['model', 'launched', 'discontinued'])

def set_row(row):
    if ((row["model"] == "Lisa")):
        return "hello"
    else:
        return "null"

df['new Column'] = df.apply(set_row, axis=1)

该示例为我提供了一个表输出，其中包含一个包含“hello”和“null”的新列。

            model  launched  discontinued new Column
0            Lisa      1983          1986      hello
1          Lisa 2      1984          1985       null
2  Macintosh 128K      1984          1984       null
3  Macintosh 512K      1984          1986       null

现在我想增强if条件并添加另一个类似的说法：

如果[列模型等于“Lisa”]或[列模型包含字符串“Mac”]在新列中返回“hello”，否则返回“null”。我怎么能这样做？

我试过了：

def set_row(row):  
    if ( (row["model"] == "Lisa") | df["model"].str.contains("Mac") ):
        return "hello"
    else:
        return "null"

我收到错误说

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index 0')

我该如何解决这个问题？

Answer 1

最好不要使用apply，因为在引擎盖下循环。更好的是numpy.where：

mask = (df["model"] == "Lisa") | df["model"].str.contains("Mac")
df['new Column'] = np.where(mask, "hello", 'null')

或者：

df['new Column'] = 'null'
df.loc[mask, 'new Column'] = "hello"

print (df)
            model  launched  discontinued new Column
0            Lisa      1983          1986      hello
1          Lisa 2      1984          1985       null
2  Macintosh 128K      1984          1984      hello
3  Macintosh 512K      1984          1986      hello

编辑：

def set_row(row):  
    if (row["model"] == "Lisa") or ("Mac" in row["model"]):
        return "hello"
    else:
        return "null"

如果条件与OR结合

1 个答案: