Question

我正在尝试使用if-else逻辑编写一个函数，该函数将修改数据框中的两列。但它不起作用。以下是我的功能

import pandas as pd
from bokeh.io import push_notebook
from bokeh.plotting import show, output_notebook
from bokeh.layouts import row
from bokeh.models.widgets import MultiSelect, DataTable, TableColumn
from bokeh.models import ColumnDataSource

output_notebook()

df=pd.DataFrame({'year':[2000,2001,2000,2001,2000,2001,2000,2001],
              'color':['red','blue','green','red','blue','green','red','blue'],
              'value':[ 0,1,2,3,4,5,6,7]})

columns=[TableColumn(field=x, title=x) for x in df.columns]
source=ColumnDataSource(df)
data_table=DataTable(source=source,columns=columns)

years=[2000,2001,2000,2001,2000,2001,2000,2001]

## MultiSelect won't let me store an integer value, so I convert them to strings

multi=MultiSelect(title="Select a Year", value=['2000','2001'],options=[str(y) for y in set(years)])

def update(attr,old, new):
    yr=multi.value
    yr_vals=[int(y) for y in yr]
    new_data=df[df.year.isin(yr_vals)]
    source.data=new_data
    push_notebook(handle=t)

multi.on_change('value',update)
t=show(row(multi,data_table),notebook_handle=True)

然后执行以下函数：

def get_comment_status(df):
    if df['address'] == 'NY':
        df['comment'] = 'call tomorrow'
        df['selection_status'] = 'interview scheduled'
        return df['comment'] 
        return df['selection_status']
    else:
        df['comment'] = 'Dont call'
        df['selection_status'] = 'application rejected'
        return df['comment']
        return df['selection_status']

但我收到了错误。我究竟做错了什么？我的猜测可能是df.apply（）语法错误

错误讯息：

TypeError：'str'对象不能解释为整数 KeyError :('address'，'发生在索引0'）

示例数据框：

df[['comment', 'selection_status']] = df.apply(get_comment_status, axis = 1)

我还想过使用lambda函数，但它没有用，因为我试图使用'='

为'comment'和'selection_status'列赋值

注意：我已经检查了this question，它与标题类似，但没有解决我的问题。

Answer 1

你尝试做的与熊猫哲学不太一致。此外，apply是一种非常低效的功能。您可能应该使用Numpy where：

import numpy as np
df['comment'] = np.where(df['address'] == 'NY',
                  'call tomorrow', 'Dont call')
df['selection_status'] = np.where(df['address'] == 'NY',
                           'interview scheduled', 'application rejected')

或布尔索引：

df.loc[df['address'] == 'NY', ['comment', 'selection_status']] \
         = 'call tomorrow', 'interview scheduled'
df.loc[df['address'] != 'NY', ['comment', 'selection_status']] \
         = 'Dont call', 'application rejected'

Answer 2

Pandas的主要优点是矢量化计算。但是，下面我将向您展示如何使用numpy.where。注意事项：

行数据一次一行地输入，而不是一次性输入整个数据帧。因此，您应该相应地命名参数。

函数中的两个pd.DataFrame.apply语句将不起作用。函数到达return时会停止。

相反，您需要返回结果列表，然后使用return解压缩。

这是一个有效的例子。

pd.Series.values.tolist

应用函数以使用if else逻辑修改多个列

2 个答案: