我正在使用JupyterLab以特定方式在电子表格中打印一些数据。
我有两个不同的文件:
对于每个original_id == id,我想按国家/地区分组并列出品牌,并汇总和列出每个品牌的所有权。
我的代码得到的结果是:
FundID Domicile (brand, AUM)
0 A1 IT (BBB, 10.0), UK (BBB, 7.0),
1 B2 CH (AAA, 12.0),
2 C3 DE (CCC, 5.0),
3 D4 CH (EEE, 9.0), UK (EEE, 11.0),
虽然,我的目标是得到这样的东西:
代码是
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 500)
df_fofs = pd.read_excel('SampleDF.xlsx')
df_extract = pd.read_excel('SampleID_ex.xlsx')
df_extract
original_id
0 A1
1 B2
2 C3
3 D4
df_fofs
brand country id holding
0 AAA UK A1 2000000
1 AAA CH B2 4000000
2 BBB UK A1 7000000
3 CCC DE C3 5000000
4 BBB IT A1 10000000
5 EEE UK D4 11000000
6 EEE CH D4 3000000
7 EEE CH D4 6000000
8 AAA CH B2 8000000
fund_ids = list(df_extract['original_id'])
result = {}
for fund in fund_ids:
temp = []
df_funds = df_fofs[(df_fofs['id'] == fund )][['country', 'brand', 'holding']]
domicile_fof = df_fofs[df_fofs['id'] == fund ][['country', 'holding']]
df_funds = df_funds.groupby(['country', 'brand'])["holding"].sum()
domicile_fof = domicile_fof.groupby('country')["holding"].sum()
s = ''
for i in range(len(df_funds)):
row = df_funds.reset_index().iloc[i]
if row['holding'] >= 5000000:
s += row['country'] + ' (' + str(row['brand']) + ', ' + str(round(((row['holding'])/1000000), 2)) + '), '
result[fund] = [s]
df_result = pd.DataFrame.from_dict(result, orient = 'index')
df_result.reset_index(inplace = True)
df_result.columns = ['FundID', 'Domicile (brand, AUM)']
df_result
FundID Domicile (brand, AUM)
0 A1 IT (BBB, 10.0), UK (BBB, 7.0),
1 B2 CH (AAA, 12.0),
2 C3 DE (CCC, 5.0),
3 D4 CH (EEE, 9.0), UK (EEE, 11.0),
答案 0 :(得分:1)
您可以将ID,ID按组和国家/地区组合的表格组合成内部项目,然后仅按ID保存分组以创建外部层级
def f(x):
n = x.apply(lambda r: '{} ({})'.format(r['brand'],int(r['holding']/1000000)), axis=1)
return '{} [{}]'.format(x.iloc[0]['country'],', '.join(n))
df_extract.merge(df_fofs, left_on='original_id', right_on='id')
.groupby(['original_id','country']).apply(f) \
.groupby(level=0).apply(', '.join)
original_id
A1 IT [BBB (10)], UK [AAA (2), BBB (7)]
B2 CH [AAA (4), AAA (8)]
C3 DE [CCC (5)]
D4 CH [EEE (3), EEE (6)], UK [EEE (11)]
dtype: object
答案 1 :(得分:0)
尝试一下
npm test --code-coverage
答案 2 :(得分:0)
您正在寻找的是功能pandas.DataFrame.pivot_table
。 pandas/pivot_table这里的文档。
此代码解决了您的示例(而不是我使用multiIndex的列表)
import pandas as pd
df = pd.DataFrame([
('AAA','UK','A1',2000000),
('AAA','CH','B2',4000000),
('BBB','UK','A1',7000000),
('CCC','DE','C3',5000000),
('BBB','IT','A1',10000000),
('EEE','UK','D4',11000000),
('EEE','CH','D4',3000000),
('EEE','CH','D4',6000000),
('AAA','CH','B2',8000000)],
columns=['brand', 'country', 'id', 'holding'])
df.pivot_table(values='holding',index=['id','country','brand'])