我有两个数据框,一个是每个子组合的值,另一个是一个更高级别的投资组合列表,每个子文件夹都汇总到这些数据框。
table1
subportfolio value
top-alpha-1 1
top-alpha-2 2
top-alpha-3 3
top-beta-1 4
top-beta-2 5
top-beta-3 6
top-gamma-1 7
top-gamma-2 8
top-gamma-3 9
table2
portfolio parent level
top-alpha-1 top-alpha 1
top-alpha-2 top-alpha 1
top-alpha-3 top-alpha 1
top-beta-1 top-beta 1
top-beta-2 top-beta 1
top-beta-3 top-beta 1
top-gamma-1 top-gamma 1
top-gamma-2 top-gamma 1
top-gamma-3 top-gamma 1
top-alpha top 2
top-beta top 2
top-gamma top 2
top self 3
我的目标是以某种方式合并这两个表,这样不仅子资源库可以填充值,而且所有较高级别都会根据它们下面的投资组合聚合得到指定值。
我的第一个想法是某种迭代,但由于它的大量数据,这可能非常耗时。
table2
portfolio value parent level
top-alpha-1 1 top-alpha 1
top-alpha-2 2 top-alpha 1
top-alpha-3 3 top-alpha 1
top-beta-1 4 top-beta 1
top-beta-2 5 top-beta 1
top-beta-3 6 top-beta 1
top-gamma-1 7 top-gamma 1
top-gamma-2 8 top-gamma 1
top-gamma-3 9 top-gamma 1
top-alpha 6 top 2
top-beta 15 top 2
top-gamma 24 top 2
top 45 self 3
答案 0 :(得分:3)
新答案
注意:我已将列名'subportfolio'
更改为'portfolio'
def agg_lvl(t1, t2):
lcol = ['level', 'portfolio']
rcol = ['parent', 'portfolio']
kwargs = dict(
left_on='portfolio', right_on='parent',
suffixes=['_', '']
)
lvl = t2[lcol].merge(t2[rcol], **kwargs).drop('portfolio_', 1).merge(t1)
lvl = lvl.groupby('parent').value.sum().rename_axis('portfolio').reset_index()
return t1.append(lvl, ignore_index=True).drop_duplicates(), t2
o1, o2 = agg_lvl(*agg_lvl(table1, table2))
o2.merge(o1)
level parent portfolio value
0 1 top-alpha top-alpha-1 1
1 1 top-alpha top-alpha-2 2
2 1 top-alpha top-alpha-3 3
3 1 top-beta top-beta-1 4
4 1 top-beta top-beta-2 5
5 1 top-beta top-beta-3 6
6 1 top-gamma top-gamma-1 7
7 1 top-gamma top-gamma-2 8
8 1 top-gamma top-gamma-3 9
9 2 top top-alpha 6
10 2 top top-beta 15
11 2 top top-gamma 24
12 3 self top 45
设置
table2 = pd.DataFrame({
'level': [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3],
'parent': [
'top-alpha',
'top-alpha',
'top-alpha',
'top-beta',
'top-beta',
'top-beta',
'top-gamma',
'top-gamma',
'top-gamma',
'top',
'top',
'top',
'self'],
'portfolio': [
'top-alpha-1',
'top-alpha-2',
'top-alpha-3',
'top-beta-1',
'top-beta-2',
'top-beta-3',
'top-gamma-1',
'top-gamma-2',
'top-gamma-3',
'top-alpha',
'top-beta',
'top-gamma',
'top']})
table1 = pd.DataFrame({
'portfolio': ['top-alpha-1', 'top-alpha-2', 'top-alpha-3', 'top-beta-1', 'top-beta-2', 'top-beta-3', 'top-gamma-1', 'top-gamma-2', 'top-gamma-3'],
'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]
})
旧答案
此解决方案利用了我的另一种解决方案,可能并不完全符合您的需求......但话又说回来,您并没有明确说明您需要什么。所以我采取了一些自由
首先,我创建了另一个数据框df
,我将subportfolio
列拆分为'-'
。
col = 'subportfolio'
rnm_dict = dict(enumerate(list('321')))
df = table1.drop(col, 1).join(table1[col].str.split('-', expand=True).rename(columns=rnm_dict))
print(df)
value 3 2 1
0 1 top alpha 1
1 2 top alpha 2
2 3 top alpha 3
3 4 top beta 1
4 5 top beta 2
5 6 top beta 3
6 7 top gamma 1
7 8 top gamma 2
8 9 top gamma 3
现在运行聚合
agged = pd.concat([
df.assign(
**{x: '' for x in '321'[i:]}
).groupby(list('321')).sum() for i in range(1, 4)
]).sort_index()
table2.join(agged.set_index(agged.index.to_series().str.join('-').str.strip('-').values), on='portfolio')
level parent portfolio value
0 1 top-alpha top-alpha-1 1
1 1 top-alpha top-alpha-2 2
2 1 top-alpha top-alpha-3 3
3 1 top-beta top-beta-1 4
4 1 top-beta top-beta-2 5
5 1 top-beta top-beta-3 6
6 1 top-gamma top-gamma-1 7
7 1 top-gamma top-gamma-2 8
8 1 top-gamma top-gamma-3 9
9 2 top top-alpha 6
10 2 top top-beta 15
11 2 top top-gamma 24
12 3 self top 45
答案 1 :(得分:2)
table3 = table2.merge(table1,
left_on="portfolio",
right_on="subportfolio",
how="left").drop('subportfolio', axis=1)
table3['letter'] = table3.portfolio.str.split('-').str[1]
table3.loc[table3.level==2, 'value'] = table3.groupby('letter').value.sum().values
table3.loc[table3.level==3, 'value'] = table3.loc[table3.level==2, 'value'].sum()
table3.drop('letter', axis=1, inplace=True)
# output
portfolio parent level value
0 top-alpha-1 top-alpha 1 1.0
1 top-alpha-2 top-alpha 1 2.0
2 top-alpha-3 top-alpha 1 3.0
3 top-beta-1 top-beta 1 4.0
4 top-beta-2 top-beta 1 5.0
5 top-beta-3 top-beta 1 6.0
6 top-gamma-1 top-gamma 1 7.0
7 top-gamma-2 top-gamma 1 8.0
8 top-gamma-3 top-gamma 1 9.0
9 top-alpha top 2 6.0
10 top-beta top 2 15.0
11 top-gamma top 2 24.0
12 top self 3 45.0
答案 2 :(得分:0)
感谢所有答案。我已经窃取了你们给我的想法,并试图建立尽可能动态的东西(任意数量的级别,任何格式的投资组合等)。
df = table2.merge(table1, on="portfolio", how="left")
for i in range(2,df.level.max()+1):
df1 = df.loc[df.level==i-1,:].groupby('parent',
as_index=False).sum().rename(columns=
{"parent":"portfolio"}).set_index('portfolio')
df = df.set_index('portfolio').combine_first(df1).reset_index()
我使用了'设置'由piRsquared在他的回答中提供。结果:
portfolio level parent value
0 top 3 self 45.0
1 top-alpha 2 top 6.0
2 top-alpha-1 1 top-alpha 1.0
3 top-alpha-2 1 top-alpha 2.0
4 top-alpha-3 1 top-alpha 3.0
5 top-beta 2 top 15.0
6 top-beta-1 1 top-beta 4.0
7 top-beta-2 1 top-beta 5.0
8 top-beta-3 1 top-beta 6.0
9 top-gamma 2 top 24.0
10 top-gamma-1 1 top-gamma 7.0
11 top-gamma-2 1 top-gamma 8.0
12 top-gamma-3 1 top-gamma 9.0
如果您想保持投资组合的顺序,可以使用
df = df.sort_values('level')