Question

在pandas数据框中使用Lambda时，如何正确引用其他列值。

dfresult_tmp2['Retention_Rolling_temp'] = dfresult_tmp2['Retention_tmp'].apply(lambda x: x if x['Count Billings']/4 < 0.20 else '')

上面的代码给了我这个错误。

TypeError: 'float' object is not subscriptable

Answer 1

dfresult_tmp2['Retention_tmp'].apply(
    lambda x: x if x['Count Billings'] / 4 < 0.20 else ''
)

您使用的pd.Series.apply与pd.DataFrame.apply不同。在这种情况下，您将迭代地将标量值传递给lambda。所以some_scalar_x['Count Billings']毫无意义。

我没有告诉你如何将你的逻辑变成apply，而是向你展示矢量化版本

选项1
pd.Series.where

dfresult_tmp2['Retention_tmp'] = \
    dfresult_tmp2['Retention_tmp'].where(
        dfresult_tmp2['Count Billings'] / 4 < .2, '')

选项2
np.where

r = dfresult_tmp2['Retention_tmp'].values
b = dfresult_tmp2['Count Billings'].values
dfresult_tmp2['Retention_tmp'] = np.where(b / 4 < .2, r, '')

选项3
apply
你要求的但不是我推荐的。

dfresult_tmp2['Retention_tmp'] = dfresult_tmp2.apply(
    lambda x: x['Retention_tmp'] if x['Count Billings'] / 4 < .2 else ''
)