pandas中是否有任何函数能够在分组时100%复制dplyr中使用的n()函数? 例如,拥有以下代码:
require(dplyr)
gb <- mtcars %>%
group_by(gear, cyl) %>%
summarise(Disp = sum(disp),
QSec = sum(qsec),
Counter = n())
我怎样才能在pandas中复制这个(以及n()函数!)? 不过,查询最终输出应如下所示:
gear cyl Disp QSec Counter
<dbl> <dbl> <dbl> <dbl> <int>
1 3.00 4.00 120 20.0 1
2 3.00 6.00 483 39.7 2
3 3.00 8.00 4291 206 12
4 4.00 4.00 821 157 8
5 4.00 6.00 655 70.7 4
6 5.00 4.00 215 33.6 2
7 5.00 6.00 145 15.5 1
8 5.00 8.00 652 29.1 2
答案 0 :(得分:0)
我认为需要groupby
和agg
按元组列表聚合 - 第一个值是新列名,第二个是聚合函数:
df = df.groupby('gear')['qsec'].agg([('QSec','sum'),('Counter','size')]).reset_index()
对于多列的groupby
,需要list
:
df = df.groupby(['gear', 'vs'])['qsec'].agg([('QSec','sum'),('Counter','size')]).reset_index()
编辑:
df = (df.groupby(['gear', 'cyl'])
.agg({'qsec':'sum', 'disp':'sum', 'gear':'size'})
.rename(columns={'gear':'Counter', 'disp':'Disp', 'qsec':'QSec'})
.reset_index())
print (df)
gear cyl QSec Counter Disp
0 3 4 20.01 1 120.1
1 3 6 39.66 2 483.0
2 3 8 205.71 12 4291.4
3 4 4 156.90 8 821.0
4 4 6 70.68 4 655.2
5 5 4 33.60 2 215.4
6 5 6 15.50 1 145.0
7 5 8 29.10 2 652.0
答案 1 :(得分:0)
我们可以使用dplython
来模仿dplyr
out = (mtcars >>
group_by(X.gear, X.cyl) >>
mutate(n = 1) >>
summarize(Disp = X.disp.sum(), QSec = X.qsec.sum(), Counter = X.n.count()))
print(out)
# gear cyl Counter Disp QSec
#0 3 4 1 120.1 20.01
#1 3 6 2 483.0 39.66
#2 3 8 12 4291.4 205.71
#3 4 4 8 821.0 156.90
#4 4 6 4 655.2 70.68
#5 5 4 2 215.4 33.60
#6 5 6 1 145.0 15.50
#7 5 8 2 652.0 29.10
import pandas as pd
from dplython import (DplyFrame, X, diamonds, select, sift, sample_n,
sample_frac, head, arrange, mutate, group_by, summarize, DelayFunction)
file = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
mtcars = DplyFrame(pd.read_csv(file))
答案 2 :(得分:0)
这是在 python 中使用 datar 的等效方法
In [2]: from datar import f
...: from datar.datasets import mtcars
...: from datar.dplyr import group_by, summarise, n
...: from datar.base import sum
...:
...: gb = (
...: mtcars >>
...: group_by(f.gear, f.cyl) >>
...: summarise(Disp=sum(f.disp),
...: QSec=sum(f.qsec),
...: Counter=n())
...: )
...: gb
[2021-04-30 16:54:15][datar][ INFO] `summarise()` has grouped output by ['gear'] (override with `_groups` argument)
Out[2]:
gear cyl Disp QSec Counter
0 3 4 120.1 20.01 1
1 3 6 483.0 39.66 2
2 3 8 4291.4 205.71 12
3 4 4 821.0 156.90 8
4 4 6 655.2 70.68 4
5 5 4 215.4 33.60 2
6 5 6 145.0 15.50 1
7 5 8 652.0 29.10 2
[Groups: ['gear'] (n=3)]
我是包的作者。随时提交问题或向我询问有关使用它的问题。