我正在尝试添加一个带有小计的新列,以及一个带有总计的最后一列。例如,
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two","one", "one", "two", "two"],
"C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
即:
A B C D E
0 foo one small 1 2
1 foo one large 2 4
2 foo one large 2 5
3 foo two small 3 5
4 foo two small 3 6
5 bar one large 4 6
6 bar one small 5 8
7 bar two small 6 9
8 bar two large 7 9
现在,我枢纽化:
table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
并添加总计:
table['total'] = table.sum(axis=1)
for t in ["D", "E"]:
table[t, "partial_total"] = table[t].sum(axis=1)
虽然这在数值上可行,但从视觉上来说很烦人。我想获取D
(包括partial_total
),然后依次为E
和total
的所有数据。这是我生成的df:
D E total D E
C large small large small partial_total partial_total
A
bar 5.5 5.500000 7.5 8.500000 27.000000 11.000000 16.000000
foo 2.0 2.333333 4.5 4.333333 13.166667 4.333333 8.833333
如此
如何将相同(顶级)列的值分组在一起??
答案 0 :(得分:1)
您可以使用margin
进行旋转:
new_df = (df.pivot_table(index='A', columns='C',
values=['D','E'], aggfunc='sum',
margins=True, margins_name='partial_total')
.assign(total=lambda x: x.loc[:, (slice(None),'partial_total')].sum(1))
)
输出:
D E total
C large small partial_total large small partial_total
A
bar 11 11 22 15 17 32 54
foo 4 7 11 9 13 22 33
partial_total 15 18 33 24 30 54 87
答案 1 :(得分:1)
尝试在.sub(pattern, str)
之前执行操作
pivot_table
g = df.groupby(['A', 'C'])[['D', 'E']]
d = (g.sum()/g.count()).reset_index()
m = d.groupby('A', as_index=False).sum().assign(C='partial')
final = pd.concat([m, d]).pivot_table(index='A', columns='C')
要专门回答您的最后一个问题
如何将相同(顶级)列的值分组在一起?
您可能只是 D E
C large small partial large small partial
A
bar 5.5 5.500000 11.000000 7.5 8.500000 16.000000
foo 2.0 2.333333 4.333333 4.5 4.333333 8.833333
sort_index
table.sort_index(axis=1)
答案 2 :(得分:1)
尝试使用pd.concat
:
table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
table.columns = [f'{i}_{j}' for i, j in table.columns]
pd.concat([table,
table.sum(axis=1, level=0).add_suffix('_partial_total'),
table.sum(axis=1).to_frame(name='total')], axis=1)
输出:
D_large D_small E_large E_small D_large_partial_total D_small_partial_total E_large_partial_total E_small_partial_total total
A
bar 5.5 5.500000 7.5 8.500000 5.5 5.500000 7.5 8.500000 27.000000
foo 2.0 2.333333 4.5 4.333333 2.0 2.333333 4.5 4.333333 13.166667