当我尝试使用基于另一列中的值的函数创建新列时,出现以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-66-491e832a7dac> in <module>()
4 return 'Other'
5
----> 6 df['PriceCatColumn'] = df.apply(PriceCat, axis=1)
TypeError: apply() missing 1 required positional argument: 'func'
这是代码:
def PriceCat (row):
if row['Median ASP'] <= 50:
return 'Category 1'
return 'Other'
df['PriceCatColumn'] = df.apply(PriceCat, axis=1)
我到底在做什么错?我研究了此问题的解决方案,但似乎并没有找到我需要的答案。
答案 0 :(得分:3)
如果只有2种可能的类别,请改用np.where
。
示例:
>>> df
Median ASP
0 1
1 2
2 51
3 52
4 5
df['PriceCatColumn'] = np.where(df['Median ASP'] <= 50, 'Category 1', 'Other')
>>> df
Median ASP PriceCatColumn
0 1 Category 1
1 2 Category 1
2 51 Other
3 52 Other
4 5 Category 1
如果还有更多类别,请使用np.select
。例如:
conds = [df['Median ASP'] <=3, df['Median ASP'] <=50]
choices = ['Category 1', 'Category 2']
df['PriceCatColumn'] = np.select(conds, choices, default='Other')
>>> df
Median ASP PriceCatColumn
0 1 Category 1
1 2 Category 1
2 51 Other
3 52 Other
4 5 Category 2
话虽如此,您的代码 似乎可以正常运行,尽管效率不如np
方法有效:
def PriceCat (row):
if row['Median ASP'] <= 50:
return 'Category 1'
return 'Other'
df['PriceCatColumn'] = df.apply(PriceCat, axis=1)
>>> df
Median ASP PriceCatColumn
0 1 Category 1
1 2 Category 1
2 51 Other
3 52 Other
4 5 Category 1
答案 1 :(得分:0)
PriceCat应该采用一个值,而不是数据框。
def PriceCat(x):
if x <= 50:
return 'Category 1'
else:
return 'Other'
df['PriceCatColumn'] = df['Median ASP'].apply(PriceCat)
X Median ASP PriceCatColumn 0 1 10 Category 1 1 2 20 Category 1 2 3 30 Category 1 3 4 40 Category 1 4 5 50 Category 1 5 6 60 Other 6 7 70 Other 7 8 80 Other 8 9 90 Other 9 10 100 Other