Question

在下面的函数中，myfun首先检查是否满足特定条件，然后继续操作。

此检查在函数内部进行。

是否可以在应用该功能之前进行检查？

例如，if [column] == xyx, .apply(myfun)

下面的一些代码：

import pandas as pd

x = pd.DataFrame({'col1':['hi','hello','hi','hello'],
                 'col2':['random', 'words', 'in', 'here']})
print(x)

    col1    col2
0     hi  random
1  hello   words
2     hi      in
3  hello    here

我的函数检查是否row['col1'] == 'hi'并返回字符串success否则返回np.nan。

def myfun(row):

    # if this row contains string 'hi'
    if row['col1'] == 'hi':

        return 'success'

    # otherwise return nan
    else:

        return pd.np.nan

# applying the function
x['result'] = x.apply(myfun,axis=1)


# result

    col1    col2   result
0     hi  random  success
1  hello   words      NaN
2     hi      in  success
3  hello    here      NaN

是否可以仅将函数应用于col1 == 'hi'的那些行，而不是在apply()函数内部执行？

注意：我更喜欢使用apply()的解决方案。我知道还有其他选项，例如np.where。

Answer 1

是的，而且比apply更好。

因为apply在每一行上循环，而loc是向量化方法。即使apply真的很强大，我也会尽量避免使用

x.loc[x['col1']=='hi', 'result'] = 'success'

Answer 2

以下是根据条件使用apply()的方法。我现在可以从功能中删除条件检查：

def myfun(row):

    return 'success'

# applying the function based on condition
x['result'] = x[x['col1']=='hi'].apply(myfun,axis=1)

我也可以先创建一个蒙版。

mask = (x['col1']=='hi')

# applying the function based on condition
x['result'] = x[mask].apply(myfun,axis=1)

apply（）基于条件的数据框上的函数

2 个答案: