我有以下示例:
data = {'model': ['Lisa', 'Lisa 2', 'Macintosh 128K', 'Macintosh 512K'],
'launched': [1983,1984,1984,1984],
'discontinued': [1986, 1985, 1984, 1986]}
df = pd.DataFrame(data, columns = ['model', 'launched', 'discontinued'])
def set_row(row):
if ((row["model"] == "Lisa")):
return "hello"
else:
return "null"
df['new Column'] = df.apply(set_row, axis=1)
该示例为我提供了一个表输出,其中包含一个包含“hello”和“null”的新列。
model launched discontinued new Column
0 Lisa 1983 1986 hello
1 Lisa 2 1984 1985 null
2 Macintosh 128K 1984 1984 null
3 Macintosh 512K 1984 1986 null
现在我想增强if条件并添加另一个类似的说法:
如果[列模型等于“Lisa”]或[列模型包含字符串“Mac”]在新列中返回“hello”,否则返回“null”。我怎么能这样做?
我试过了:
def set_row(row):
if ( (row["model"] == "Lisa") | df["model"].str.contains("Mac") ):
return "hello"
else:
return "null"
我收到错误说
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index 0')
我该如何解决这个问题?
答案 0 :(得分:1)
最好不要使用apply
,因为在引擎盖下循环。更好的是numpy.where
:
mask = (df["model"] == "Lisa") | df["model"].str.contains("Mac")
df['new Column'] = np.where(mask, "hello", 'null')
或者:
df['new Column'] = 'null'
df.loc[mask, 'new Column'] = "hello"
print (df)
model launched discontinued new Column
0 Lisa 1983 1986 hello
1 Lisa 2 1984 1985 null
2 Macintosh 128K 1984 1984 hello
3 Macintosh 512K 1984 1986 hello
编辑:
def set_row(row):
if (row["model"] == "Lisa") or ("Mac" in row["model"]):
return "hello"
else:
return "null"