Question

我希望帮助改进连接，这会接收不断增加的数据。我们有两个数据集temp1：

Muid advertiserid content 

1     100          1

1     100          2

1     100         56

1     101         1

1     101         34

and temp2 as:

Muid advertiserid content  approved

1     100          1        1

1     101          1        0

1     100         56        0

1     200         1         1

1     100         2         1

目标是根据muid内容和advertiserid的链接，将表1的用户填入或不批准。现在我将这两个数据帧加入：

recos=pd.merge(temp1,temp2,how='left',left_on=['muid','content','advertiserid'],right_on=['muid','content','advertiserid'])

之前这个连接执行得很完美，但是随着输入的大小，特别是temp1的增长，目前有数百万行，这给了我执行时的内存错误。

有人可以建议我更好地完成任务。

优化pandas加入/寻找更有效的内存方式

0 个答案: