Question

以下for循环有效，但需要很长时间。数据框df_customers约有150万个条目，而dict_customers约有50万行。

for i in range(len(df_customers)):
    df_customers.iloc[i, j] = dict_customers[df_customers.iloc[i,k]]

我的问题是：如何加快循环速度？

数据框df_customers包含客户功能。客户ID。一位客户有几行（因此每行不是唯一的）。

字典dict_customers包含唯一的客户ID（键）和每个客户的访问次数（值）。

我想向数据帧df_customers中添加一个新列k，该列具有从字典中检索到的访问次数。

我使用df_customers的for循环解决了这个问题：我是行 j是具有访问次数的新列 k是具有CustomerID的现有列

注意：CustomerID从100 000开始。

我尝试了以下理解：

df_customers.j-column = [dict_customers[df_custumers.k-column[i]] for i in range(len(df_customers))]

书面的理解代码不起作用。它将所有值保持为0（已初始化）。预期的输出结果是，根据客户访问的字典中的CustomerID，将其存储在新的df_customers列k中。

Answer 1

我找到了解决方法：

创建字典值的列表（客户ID为顺序）：
list_values = [v for v in dict_customers.values()]
创建此列表的数组（也可以加快速度）：
array_values = np.array(list_values
理解将返回由指针指向的数组的值 df_customers j列（并已更正，因为CustomerID从 100 000，数组的索引为0）：
df_customers['j-column'] = array_values[df_customers.iloc[i,k] - 100000] for i in range(len(df_customers))]