Question

我在R

中有以下声明

library(plyr)
filteredData <- ddply(data, .(ID1, ID2), businessrule)

我正在尝试使用Python和Pandas来复制动作。我试过......

data['judge'] = data.groupby(['ID1','ID2']).apply(lambda x: businessrule(x))

但是这提供了错误......

 incompatible index of inserted column with frame index

Answer 1

可以使用

复制错误消息

import pandas as pd

df = pd.DataFrame(np.arange(12).reshape(4,3), columns=['ID1', 'ID2', 'val'])
df['new'] = df.groupby(['ID1', 'ID2']).apply(lambda x: x.values.sum())
# TypeError: incompatible index of inserted column with frame index

由于这个玩具示例的原因，您的代码可能会引发错误。右侧是具有2级MultiIndex的系列：

ID1  ID2
0    1       3
3    4      12
6    7      21
9    10     30
dtype: int64

df['new'] = ...告诉Pandas将此系列分配到df中的列。但是df有一个单级索引：

   ID1  ID2  val
0    0    1    2
1    3    4    5
2    6    7    8
3    9   10   11

因为单级索引与2级MultiIndex不兼容，所以分配失败。通常永远不会正确分配结果 groupby/apply列df，除非您分组的列或级别也恰好是原始DataFrame中的有效索引键df。

相反，将系列分配给一个新变量，就像R代码所做的那样：

filteredData = data.groupby(['ID1','ID2']).apply(businessrule)

请注意，lambda x: businessrule(x)可以替换为businessrule。

Pandas / Python中的多列DDPLY / R函数

1 个答案: