Question

我有一个看起来像这样的pandas数据框

ID     country   month   revenue  profit   ebit
234    USA       201409   10        5       3
344    USA       201409    9        7       2
532    UK        201410    20       10      5
129    Canada    201411    15       10      5

我希望按ID，国家/地区，月份进行分组，并计算每月和每个国家/地区的ID，并将收入，利润和ebit相加。上述数据的输出为：

 country   month    revenue   profit  ebit   count
   USA     201409     19        12      5      2
   UK      201409     20        10      5      1
   Canada  201411     15        10      5      1

我尝试了pandas的groupby，sum和count函数的不同变体，但我无法弄清楚如何应用groupby sum并统计所有以给出结果如图所示。请分享您可能有的任何想法。谢谢！

Answer 1

您可以执行groupby，然后将每个国家/地区的计数映射到新列。

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()
g['count'] = g['country'].map(df['country'].value_counts())
g

Out[3]:


    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   USA      201409  19       12      5     2

修改

要获取每个国家/地区和月份的计数，您可以执行另一个groupby，然后将两个DataFrame连接在一起。

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum() j = df.groupby(['country', 'month']).size().to_frame('count') pd.merge(g, j, left_index=True, right_index=True).reset_index() Out[6]: country month revenue profit ebit count 0 Canada 201411 15 10 5 1 1 UK 201410 20 10 5 1 2 UK 201411 10 5 2 1 3 USA 201409 19 12 5 2

我为英国添加了另一个日期不同的记录 - 请注意合并的DataFrame中现在有两个英国条目，并且具有相应的计数。

Answer 2

可以使用pivot_table这样做：

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1 
                ebit  profit  revenue
country month                        
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country  month 
Canada   201411    1
UK       201410    1
USA      201409    2

>>> pd.concat([df1,df2],axis=1)

                ebit  profit  revenue  count
country month                               
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2

Answer 3

以下解决方案似乎最简单。

按国家和月份分组：

true

将总和应用于感兴趣的列（收入、利润、ebit）：

grouped_df = df.groupby(['country', 'month'])

将 grouped_df 的大小分配给 'final' 中的新列：

final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')

大功告成！

Groupby总和并计算python

3 个答案: