我正在尝试使用if-else逻辑编写一个函数,该函数将修改数据框中的两列。但它不起作用。以下是我的功能
import pandas as pd
from bokeh.io import push_notebook
from bokeh.plotting import show, output_notebook
from bokeh.layouts import row
from bokeh.models.widgets import MultiSelect, DataTable, TableColumn
from bokeh.models import ColumnDataSource
output_notebook()
df=pd.DataFrame({'year':[2000,2001,2000,2001,2000,2001,2000,2001],
'color':['red','blue','green','red','blue','green','red','blue'],
'value':[ 0,1,2,3,4,5,6,7]})
columns=[TableColumn(field=x, title=x) for x in df.columns]
source=ColumnDataSource(df)
data_table=DataTable(source=source,columns=columns)
years=[2000,2001,2000,2001,2000,2001,2000,2001]
## MultiSelect won't let me store an integer value, so I convert them to strings
multi=MultiSelect(title="Select a Year", value=['2000','2001'],options=[str(y) for y in set(years)])
def update(attr,old, new):
yr=multi.value
yr_vals=[int(y) for y in yr]
new_data=df[df.year.isin(yr_vals)]
source.data=new_data
push_notebook(handle=t)
multi.on_change('value',update)
t=show(row(multi,data_table),notebook_handle=True)
然后执行以下函数:
def get_comment_status(df):
if df['address'] == 'NY':
df['comment'] = 'call tomorrow'
df['selection_status'] = 'interview scheduled'
return df['comment']
return df['selection_status']
else:
df['comment'] = 'Dont call'
df['selection_status'] = 'application rejected'
return df['comment']
return df['selection_status']
但我收到了错误。我究竟做错了什么 ?我的猜测可能是df.apply()语法错误
错误讯息:
TypeError:'str'对象不能解释为整数 KeyError :('address','发生在索引0')
示例数据框:
df[['comment', 'selection_status']] = df.apply(get_comment_status, axis = 1)
我还想过使用lambda函数,但它没有用,因为我试图使用'='
为'comment'和'selection_status'列赋值注意:我已经检查了this question,它与标题类似,但没有解决我的问题。
答案 0 :(得分:2)
你尝试做的与熊猫哲学不太一致。此外,apply
是一种非常低效的功能。您可能应该使用Numpy where
:
import numpy as np
df['comment'] = np.where(df['address'] == 'NY',
'call tomorrow', 'Dont call')
df['selection_status'] = np.where(df['address'] == 'NY',
'interview scheduled', 'application rejected')
或布尔索引:
df.loc[df['address'] == 'NY', ['comment', 'selection_status']] \
= 'call tomorrow', 'interview scheduled'
df.loc[df['address'] != 'NY', ['comment', 'selection_status']] \
= 'Dont call', 'application rejected'
答案 1 :(得分:2)
Pandas的主要优点是矢量化计算。但是,下面我将向您展示 如何使用numpy.where
。注意事项:
pd.DataFrame.apply
语句将不起作用。函数到达return
时会停止。return
解压缩。这是一个有效的例子。
pd.Series.values.tolist