在两列上进行分组后,获取级别1的分组计数

时间:2019-06-24 06:24:36

标签: pandas pandas-groupby

我在两列上进行分组,需要计算1级值的数量

我尝试了以下操作:

>>> import pandas as pd
>>> df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': [1, 2, 0, 4, 3, 4], 'C': [3,3,3,3,4,8]})
>>> print(df)
       A  B  C
0    one  1  3
1    one  2  3
2    two  0  3
3  three  4  3
4  three  3  4
5    one  4  8
>>> aggregator = {'C': {'sC' : 'sum','cC':'count'}}
>>> df.groupby(["A", "B"]).agg(aggregator)
/envs/pandas/lib/python3.7/site-packages/pandas/core/groupby/generic.py:1315: FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
  return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
         C   
        sC cC
A     B      
one   1  3  1
      2  3  1
      4  8  1
three 3  4  1
      4  3  1
two   0  3  1

我想要这样的输出,其中最后一列tC为我提供了与组onetwothree对应的计数。

         C   
        sC cC tC
A     B      
one   1  3  1 3
      2  3  1
      4  8  1
three 3  4  1 2
      4  3  1
two   0  3  1 1

1 个答案:

答案 0 :(得分:2)

如果只有一列用于元组的聚合传递列表:

aggregator = [('sC' , 'sum'),('cC', 'count')]
df = df.groupby(["A", "B"])['C'].agg(aggregator)

对于最后一列,将第一级转换为Series的{​​{1}},得到GroupBy.transformGroupBy.size的计数,而第一值仅使用numpy.where

MultiIndex

您还可以在s = df.index.get_level_values(0).to_series() df['tC'] = np.where(s.duplicated(), np.nan, s.groupby(s).transform('size')) print(df) sC cC tC A B one 1 3 1 3.0 2 3 1 NaN 4 8 1 NaN three 3 4 1 2.0 4 3 1 NaN two 0 3 1 1.0 列中将重复的值设置为空字符串,但是后来由于该值与字符串混合:数字混合,此列的所有数字运算都失败了

tC