我下面有以下多索引熊猫。我正在尝试创建:
但是我不确定我将如何做到这一点。任何指针都会有所帮助
之前的原始df_matrix:
print(df_matrix.head(10))
lob project_rank duration_in_status
0 Commodities CM LOB 2.0
1 Commodities Index Book Migration 25.0
2 Cross Platform CM LOB 0.0
3 Cross Platform CSAVA 16.0
4 Cross Platform Calypso Migration 0.0
5 Cross Platform EMD / Delta One 0.0
6 Cross Platform FRTB 68.0
7 Cross Platform Index Book Migration 1.0
8 Cross Platform Instruments 3.0
9 Cross Platform KOJAK 0.0
之前的多索引:
duration_in_status
lob project_rank
Commodities CM LOB 2.0
Index Book Migration 25.0
Cross Platform CM LOB 0.0
CSAVA 16.0
Calypso Migration 0.0
EMD / Delta One 0.0
FRTB 68.0
Index Book Migration 1.0
Instruments 3.0
KOJAK 0.0
LOB BOW 324.0
Non-Trading 0.0
Notes Workflow 23.0
PROD 0.0
Result Service 53.0
Tech Debt 96.0
Interest Rates LOB BOW 0.0
Other Notes Workflow 0.0
Treasury 2B2 1.0
验收标准结果:
答案 0 :(得分:1)
好像你想要的
df['proj_num'] = df.groupby('lob').project_rank.cumcount() + 1
df['depth'] = df.groupby('lob').project_rank.transform(len)
在应用多索引之前:)
lob project_rank duration_in_status proj_num depth
0 Commodities CMLOB 2.0 1 2
1 Commodities IndexBookMigration 25.0 2 2
2 Cross_Platform CMLOB 0.0 1 8
3 Cross_Platform CSAVA 16.0 2 8
4 Cross_Platform CalypsoMigration 0.0 3 8
5 Cross_Platform EMD/DeltaOne 0.0 4 8
6 Cross_Platform FRTB 68.0 5 8
7 Cross_Platform IndexBookMigration 1.0 6 8
8 Cross_Platform Instruments 3.0 7 8
9 Cross_Platform KOJAK 0.0 8 8
答案 1 :(得分:0)
我只会使用groupby并申请。
# assuming "df" is the variable containing the data as you showed in the question...
import numpy as np
def group_function(sub_dataframe):
sub_dataframe["proj_num"] = np.arange(df.shape[0]) + 1
sub_dataframe["depth"] = df.shape[0]
return sub_dataframe
df = df.reset_index().groupby("lob").apply(group_function)
df = df.set_index(["lob","project_rank"])
如果要先创建先前创建的多索引,则可以在此之前执行。这样,您就不需要reset_index,而只需创建一次即可。
# in that case, something like this should work.
df = df.groupby("lob").apply(group_function). df.set_index(["lob","project_rank"])