Question

我正在尝试按特定行类型进行分组和总结，例如3公司卖鞋，外套和拖鞋，我想分组公司并通过特定销售型鞋+外套添加它们。

文字输入 -

  company  selltype  price
0       a      shoe     34
1       a      coat     23
2       a  slippers     12
3       b      shoe     55
4       b      coat     34
5       b  slippers     23
6       c      shoe     65
7       c      coat     34
8       c  slippers     12

Answer 1

使用groupby + agg -

i = df.selltype.isin(['shoe', 'coat'])
j = i.ne(i.shift()).cumsum()

f = {'selltype' : '+'.join, 'price' : 'sum'}
df.groupby(['company', j], as_index=False).agg(f)

  company   selltype  price
0       a  shoe+coat     57
1       a   slippers     12
2       b  shoe+coat     89
3       b   slippers     23
4       c  shoe+coat     99
5       c   slippers     12

<强>详情

我们需要对两个谓词进行分组 -

company列和
正在出售的商品

由于我们一起考虑鞋子和外套，我们需要创建一个反映这一点的自定义系列，使用i和j计算 -

i = df.selltype.isin(['shoe', 'coat'])
i

0     True
1     True
2    False
3     True
4     True
5    False
6     True
7     True
8    False
Name: selltype, dtype: bool

j = i.ne(i.shift()).cumsum()
j

0    1
1    1
2    2
3    3
4    3
5    4
6    5
7    5
8    6
Name: selltype, dtype: int64

现在，剩下的就是分组操作 -

df = df.groupby(['company', j], as_index=False).agg(f)

要获得您的确切输出，您可以在此处执行更多操作，使用pd.Series.where -

df.company = df.company.where(df.company.ne(df.company.shift()), '')
df

  company   selltype  price
0       a  shoe+coat     57
1           slippers     12
2       b  shoe+coat     89
3           slippers     23
4       c  shoe+coat     99
5           slippers     12

Answer 2

treatsame={'shoe':'coat'}
df.groupby([df.company,df.selltype.replace(treatsame)]).\
    agg(lambda x :x.sum() if x.dtype=='int64' else '+'.join(x)).\
        reset_index('selltype',drop=True)
Out[40]: 
          selltype  price
company                  
a        shoe+coat     57
a         slippers     12
b        shoe+coat     89
b         slippers     23
c        shoe+coat     99
c         slippers     12

Answer 3

还有更多的步骤，而不是像其他答案那样简洁，但是逐步分解了这个过程

class PyRecognitionContext {
    PyContext pyContext;

    // ... rest of the code
};

groupby和sum在pandas中指定行类型

3 个答案: