假设我有一个多索引列的数据框,
TSLA MSFT
Year revenues other_revenues expenses revenues other_revenues expenses
2019 851 10 110 200 13 213
2018 725 11 111 150 14 214
如何添加内列来获取
TSLA MSFT
Year revenues other_revenues expenses sum revenues other_revenues expenses sum
2019 851 10 110 971 200 13 213 426
2018 725 11 111 847 150 14 214 378
其次,一般来说,在使用多索引列时,我应该注意哪些常用功能?使用多索引列时有什么思路吗?我很舒服地思考正常(单级索引)但不习惯多索引。谢谢!
答案 0 :(得分:2)
首先创建由 sum
和 MultiIndex
填充到 df1
的新 DataFrame:
sub = ['revenues', 'other_revenues', 'expenses']
df1 = df.sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])
然后使用 concat
连接在一起:
df = pd.concat([df, df1], axis=1)
添加自定义订单的 lasr reindex
:
mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
MSFT TSLA \
revenues other_revenues expenses sum revenues other_revenues expenses
Year
2019 200 13 213 426 851 10 110
2018 150 14 214 378 725 11 111
sum
Year
2019 971
2018 847
编辑:您可以使用 slicers 进行查看(但我认为这里有必要排序 MultiIndex
):
idx = pd.IndexSlice
print (df.loc[:, idx[:, ['revenues','other_revenues']]])
TSLA MSFT TSLA MSFT
revenues revenues other_revenues other_revenues
2019 851 200 10 13
2018 725 150 11 14
# df.index.name = 'Year'
sub = ['revenues', 'other_revenues', 'expenses']
df1 = df.loc[:, idx[:, ['revenues','other_revenues']]].sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])
df = pd.concat([df, df1], axis=1)
mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
MSFT TSLA \
revenues other_revenues expenses sum revenues other_revenues expenses
2019 200 13 213 213 851 10 110
2018 150 14 214 164 725 11 111
sum
2019 861
2018 736
答案 1 :(得分:1)
您可以使用 stack
和 unstack
:
>>> df2 = df.stack(1).unstack(0)
>>> df2.loc['sum', :] = df2.sum()
>>> df2.stack(1).unstack(0).reindex(df.index).reindex(
columns=df.columns.levels[0], level=0
)
TSLA MSFT
expenses other_revenues revenues sum expenses other_revenues revenues sum
Year
2019 110.0 10.0 851.0 971.0 213.0 13.0 200.0 426.0
2018 111.0 11.0 725.0 847.0 214.0 14.0 150.0 378.0
对特定列求和:
>>> df2 = df.stack(1).unstack(0)
>>> df2.loc['sum', :] = df2.loc[['revenues', 'other_revenues'], :].sum()
>>> df2.stack(1).unstack(0).reindex(df.index).reindex(
columns=df.columns.levels[0], level=0
)
TSLA MSFT
expenses other_revenues revenues sum expenses other_revenues revenues sum
Year
2019 110.0 10.0 851.0 861.0 213.0 13.0 200.0 213.0
2018 111.0 11.0 725.0 736.0 214.0 14.0 150.0 164.0
或者沿join
使用sum
和axis=1, level=0
:
>>> cols = pd.MultiIndex.from_product([df.columns.levels[0], ['sum']])
>>> df.join(
df.sum(axis=1, level=0).set_axis(
cols,
axis=1
)
).reindex(columns=df.columns.levels[0], level=0)
TSLA MSFT
revenues other_revenues expenses sum revenues other_revenues expenses sum
Year
2019 851 10 110 971 200 13 213 426
2018 725 11 111 847 150 14 214 378
对于自定义列:
>>> df.join(
df.loc[:, (slice(None), ['revenues', 'other_revenues'])]
.sum(axis=1, level=0).set_axis(
cols,
axis=1
)
).reindex(columns=df.columns.levels[0], level=0)
TSLA MSFT
revenues other_revenues expenses sum revenues other_revenues expenses sum
Year
2019 851 10 110 861 200 13 213 213
2018 725 11 111 736 150 14 214 164