Question

我有以下 df：

<头>

operator_id	total_records	avg_wait_time	is_missed_call	out_calls_cnt
0	879896.0	117	17.958253	47
1	879898.0	227	17.239858	89
2	880020.0	20	6.815000	6
3	880022.0	70	16.172996	29

我尝试创建一个名为“test”的新列，它将显示 out_calls_cnt 占 total_records 的百分比，条件是 out_calls_cnt 大于 1，否则函数应返回 0。

我认为使用行函数循环效率低下。

我的代码：

dataset_operators['test'] = dataset_operators[['out_calls_cnt', 'total_records']].apply(lambda x:  dataset_operators['out_calls_cnt'] / dataset_operators['total_rows'] if dataset_operators['out_calls_cnt'] > 10 else 0, axis = 1)

出现错误： ValueError：系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

我想尝试使用 lambda 来解决它，即使我设法使用 where 来解决它：

dataset_operators['test'] = (dataset_operators['out_calls_cnt'] / dataset_operators['total_records']).where(dataset_operators['out_calls_cnt'] > 10, 0)

Answer 1

这是使用 np.where 的替代方法，使用您显示的示例，请尝试以下操作。这将在 df 中创建一个名为 test 的新列，您也可以根据需要对其进行更改。

import numpy as np
import pandas as pd
df['test'] = np.where(df['out_calls_cnt']>10,df['out_calls_cnt'] / df['total_records'],0)

Answer 2

我建议不要使用 apply 并使用您的第二个解决方案使用 where 但由于您特别要求它，您可以执行以下操作，将 lambda 调用中的 dataset_operators 替换为 {{1} }

使用 apply lambda 函数根据另一列的条件创建一个新列

2 个答案: