让我们说我有一个数据框,其中有一列称为值,并且要为此列计算每组的总观察值,总空观察值,均值和中位数。
即
mydf.groupby(['date_ym','category']).agg(['count', 'mean', 'median']).reset_index()
Out[135]:
date_ym category values
count mean median
0 2018-01 A 2 4.55 4.55
1 2018-01 B 0 NaN NaN
2 2018-02 A 1 6.20 6.20
3 2018-02 B 0 NaN NaN
4 2018-03 B 0 NaN NaN
如果我使用groupby和agg,则会得到以下输出:
date_ym category values
count countNAs mean median
0 2018-01 A 2 1 4.55 4.55
1 2018-01 B 0 1 NaN NaN
2 2018-02 A 1 0 6.20 6.20
3 2018-02 B 0 1 NaN NaN
4 2018-03 B 0 1 NaN NaN
但是我真正想要的输出如下:
{{1}}
答案 0 :(得分:1)
您可以使用
def countNAs(x): return x.isnull().sum()
mydf.groupby(['date_ym','category']).agg(['count',countNAs, 'mean', 'median']).reset_index()
Out[647]:
date_ym category values
count countNAs mean median
0 2018-01 A 2 1.0 4.55 4.55
1 2018-01 B 0 1.0 NaN NaN
2 2018-02 A 1 0.0 6.20 6.20
3 2018-02 B 0 1.0 NaN NaN
4 2018-03 B 0 1.0 NaN NaN
答案 1 :(得分:0)
这不是直截了当的方法,但是可以做到。
data2 = data.frame('population by age' = seq(5, 11, by = 1),
'2008' = c(145391,
140621,
136150,
131944,
198933,
182182,
159103
),
'2009' = c(148566,
143943,
139367,
135083,
212196,
196398,
155033
),
'2010' = c(152330,
147261,
142555,
138172,
218701,
161330,
142190
),
'2011' = c(156630,
151387,
146491,
141905,
119397,
116093,
112666
),
'2012' = c(133545,
129737,
126124,
122678,
120213,
116826,
113381
),
'2013' = c(119397,
116093,
112666,
109174,
106871,
103659,
100398))
data1 <- data.frame('2008'= c(7,
8,
9,
10),
'2009' = c(7,
8,
9,
10),
'2010' = c(7,
8,
9,
10),
'2011' = c(6,
7,
8,
9),
'2012' = c(6,
7,
8,
9),
'2013' = c(6,
7,
8,
9)
)