首先,这篇文章非常有用:How to pivot a dataframe
现在我有以下目标:
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
... "bar", "bar", "bar", "bar"],
... "B": ["one", "one", "one", "two", "two",
... "one", "one", "two", "two"],
... "C": ["small", "large", "large", "small",
... "small", "large", "small", "small",
... "large"],
... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
... aggfunc={'D': np.mean,
... 'E': ['count', max, np.mean]})
flattened = pd.DataFrame(table.to_records())
带有结果和目标:
A C ('D', 'mean') ('E', 'count') ('E', 'max') ('E', 'mean')
0 bar large 5.500000 2.0 9.0 7.500000
1 bar small 5.500000 2.0 9.0 8.500000
2 foo large 2.000000 2.0 5.0 4.500000
3 foo small 2.333333 3.0 6.0 4.333333
groupby是否有任何等效命令?类似于:df.groupby(['row', 'col'])['val0'].agg(['size', 'mean']).unstack(fill_value=0)
或者这已经是最有效的方法了?
答案 0 :(得分:2)
Groupby
的替代方案是:
df = df.groupby(['A', 'C']).agg({'D': np.mean, 'E': ['count', max, np.mean]})
print (df)
D E
mean count max mean
A C
bar large 5.500000 2 9 7.500000
small 5.500000 2 9 8.500000
foo large 2.000000 2 5 4.500000
small 2.333333 3 6 4.333333
df.columns = df.columns.map('_'.join)
print (df)
D_mean E_count E_max E_mean
A C
bar large 5.500000 2 9 7.500000
small 5.500000 2 9 8.500000
foo large 2.000000 2 5 4.500000
small 2.333333 3 6 4.333333