我在两列上进行分组,需要计算1级值的数量
我尝试了以下操作:
>>> import pandas as pd
>>> df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': [1, 2, 0, 4, 3, 4], 'C': [3,3,3,3,4,8]})
>>> print(df)
A B C
0 one 1 3
1 one 2 3
2 two 0 3
3 three 4 3
4 three 3 4
5 one 4 8
>>> aggregator = {'C': {'sC' : 'sum','cC':'count'}}
>>> df.groupby(["A", "B"]).agg(aggregator)
/envs/pandas/lib/python3.7/site-packages/pandas/core/groupby/generic.py:1315: FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
C
sC cC
A B
one 1 3 1
2 3 1
4 8 1
three 3 4 1
4 3 1
two 0 3 1
我想要这样的输出,其中最后一列tC
为我提供了与组one
,two
和three
对应的计数。
C
sC cC tC
A B
one 1 3 1 3
2 3 1
4 8 1
three 3 4 1 2
4 3 1
two 0 3 1 1
答案 0 :(得分:2)
如果只有一列用于元组的聚合传递列表:
aggregator = [('sC' , 'sum'),('cC', 'count')]
df = df.groupby(["A", "B"])['C'].agg(aggregator)
对于最后一列,将第一级转换为Series
的{{1}},得到GroupBy.transform
和GroupBy.size
的计数,而第一值仅使用numpy.where
:>
MultiIndex
您还可以在s = df.index.get_level_values(0).to_series()
df['tC'] = np.where(s.duplicated(), np.nan, s.groupby(s).transform('size'))
print(df)
sC cC tC
A B
one 1 3 1 3.0
2 3 1 NaN
4 8 1 NaN
three 3 4 1 2.0
4 3 1 NaN
two 0 3 1 1.0
列中将重复的值设置为空字符串,但是后来由于该值与字符串混合:数字混合,此列的所有数字运算都失败了
tC