Dplyr R n()按功能分组到Pandas类似函数

时间:2018-03-27 11:02:56

标签: python r pandas

pandas中是否有任何函数能够在分组时100%复制dplyr中使用的n()函数? 例如,拥有以下代码:

require(dplyr)
gb <- mtcars %>%
      group_by(gear, cyl) %>% 
      summarise(Disp    = sum(disp),
                QSec    = sum(qsec),
                Counter = n())

我怎样才能在pandas中复制这个(以及n()函数!)? 不过,查询最终输出应如下所示:

   gear   cyl  Disp  QSec Counter
  <dbl> <dbl> <dbl> <dbl>   <int>
1  3.00  4.00   120  20.0       1
2  3.00  6.00   483  39.7       2
3  3.00  8.00  4291 206        12
4  4.00  4.00   821 157         8
5  4.00  6.00   655  70.7       4
6  5.00  4.00   215  33.6       2
7  5.00  6.00   145  15.5       1
8  5.00  8.00   652  29.1       2

3 个答案:

答案 0 :(得分:0)

我认为需要groupbyagg按元组列表聚合 - 第一个值是新列名,第二个是聚合函数:

df = df.groupby('gear')['qsec'].agg([('QSec','sum'),('Counter','size')]).reset_index()

对于多列的groupby,需要list

df = df.groupby(['gear', 'vs'])['qsec'].agg([('QSec','sum'),('Counter','size')]).reset_index()

编辑:

df = (df.groupby(['gear', 'cyl'])
        .agg({'qsec':'sum', 'disp':'sum', 'gear':'size'})
        .rename(columns={'gear':'Counter', 'disp':'Disp', 'qsec':'QSec'})
        .reset_index())
print (df)
   gear  cyl    QSec  Counter    Disp
0     3    4   20.01        1   120.1
1     3    6   39.66        2   483.0
2     3    8  205.71       12  4291.4
3     4    4  156.90        8   821.0
4     4    6   70.68        4   655.2
5     5    4   33.60        2   215.4
6     5    6   15.50        1   145.0
7     5    8   29.10        2   652.0

答案 1 :(得分:0)

我们可以使用dplython来模仿dplyr

的某些功能
out = (mtcars >>
 group_by(X.gear, X.cyl) >>
 mutate(n = 1) >>
 summarize(Disp = X.disp.sum(), QSec = X.qsec.sum(), Counter = X.n.count()))

print(out)
#  gear  cyl  Counter    Disp    QSec
#0     3    4        1   120.1   20.01
#1     3    6        2   483.0   39.66
#2     3    8       12  4291.4  205.71
#3     4    4        8   821.0  156.90
#4     4    6        4   655.2   70.68
#5     5    4        2   215.4   33.60
#6     5    6        1   145.0   15.50
#7     5    8        2   652.0   29.10

数据

import pandas as pd
from dplython import (DplyFrame, X, diamonds, select, sift, sample_n, 
    sample_frac, head, arrange, mutate, group_by, summarize, DelayFunction) 

file = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
mtcars = DplyFrame(pd.read_csv(file))

答案 2 :(得分:0)

这是在 python 中使用 datar 的等效方法

In [2]: from datar import f
   ...: from datar.datasets import mtcars
   ...: from datar.dplyr import group_by, summarise, n
   ...: from datar.base import sum
   ...: 
   ...: gb = (
   ...:   mtcars >>
   ...:   group_by(f.gear, f.cyl) >>
   ...:   summarise(Disp=sum(f.disp),
   ...:             QSec=sum(f.qsec),
   ...:             Counter=n())
   ...: )
   ...: gb
[2021-04-30 16:54:15][datar][   INFO] `summarise()` has grouped output by ['gear'] (override with `_groups` argument)
Out[2]: 
   gear  cyl    Disp    QSec  Counter
0     3    4   120.1   20.01        1
1     3    6   483.0   39.66        2
2     3    8  4291.4  205.71       12
3     4    4   821.0  156.90        8
4     4    6   655.2   70.68        4
5     5    4   215.4   33.60        2
6     5    6   145.0   15.50        1
7     5    8   652.0   29.10        2
[Groups: ['gear'] (n=3)]

我是包的作者。随时提交问题或向我询问有关使用它的问题。