我分别在数据场df1和df2中有两列C1和C2。我希望将C1和C2合并到df1(或df2)中,以便如果键相同,则字典中的值相加,如果键不存在,则将值附加到字典中。
我想要的输出是: 我想在每行的le_id级别聚合所有内容,run_seq ='latest'并合并两个字典,即如果键存在,则添加值,否则将添加到现有字典中。输出应为数据框。像这样的结果不完全相同
编辑: 创建此数据框的代码是:
import pandas as pd
data = {'le_id' : [101]*4 + [102]*4 + [103]*3 + [104]*5 + [101],
'run_seq' : [31]*11 + [32]*6,
'cp_id' : [201, 201, 201, 201, 203, 204, 205, 205, 206, 208, 209, 202, 201, 204, 205, 208, 208],
'cp_name' : ['A', 'A', 'A', 'A', 'B', 'C', 'E', 'E', 'F', 'G', 'H', 'B', 'A', 'D', 'E', 'H', 'H'],
'products' : ['U', 'U', 'U', 'W', 'X', 'U', 'U', 'V', 'W', 'X','U', 'U', 'V', 'W', 'X', 'Z', 'U'],
'tran_amnt' : [10203, 13789, 74378, 47833, 40237, 93732, 63738, 42563, 92822, 11276, 63633, 99292, 27892, 82727, 32442, 55622, 43535],
'currency' : ['USD', 'YEN', 'USD', 'SGD', 'USD', 'INR', 'INR', 'SGD', 'USD', 'INR', 'SGD', 'SGD', 'SGD', 'SGD', 'INR', 'INR', 'INR']}
data_gb = data.groupby(['le_id', 'cp_id', 'run_seq', 'products', 'currency']).sum()
data_gb.reset_index(inplace = True)
data_gb
然后我执行此操作,结果是以上两个数据帧:
pd.set_option('display.max_columns', None)
from pprint import pprint
def add_current_counterparty_interaction(df):
df_this = df.copy()
current_interaction_dict = {}
for cp_id in df_this.cp_id.values:
prod_curr_dict = {}
rows_to_filter = df_this.cp_id==cp_id
for prod in df_this.loc[rows_to_filter, 'products'].values: #product repeating
curr_amount_dict = {}
rows_to_filter_prod = rows_to_filter & (df_this.products == prod)
for curr in df_this.loc[rows_to_filter,'currency'].values:
rows_to_filter_curr = rows_to_filter_prod & (df_this.currency == curr)
if rows_to_filter_curr.any():
#print(df_this[rows_to_filter_curr].tran_amnt)
this_tx_amt = df_this[rows_to_filter_curr].tran_amnt.values
curr_amount_dict[curr] = this_tx_amt
if len(curr_amount_dict):
prod_curr_dict[prod] = curr_amount_dict
if len(prod_curr_dict):
current_interaction_dict[cp_id] = prod_curr_dict
pprint(current_interaction_dict)
return current_interaction_dict
def process_counterparty_journey(df, rs):
df_this = df[df["run_seq"]==rs].copy()
#add_current_counterparty_interaction(df_this, 101)
df_op = df_this.groupby(["le_id", "run_seq"]).apply(lambda x: add_current_counterparty_interaction(x)).reset_index()
#print(df.head())
#print(df_this)
#pprint(df_op[0].values[0])
pprint(df_op)
return df_op
a = process_counterparty_journey(data_gb, 31)
b = process_counterparty_journey(data_gb, 32)
结果数据框a和b是上面打印的两个数据框。 rs是run_seq,即31和32。
我想要的功能应该以a和b(数据帧)作为参数,并应该返回一个数据帧。看看我发布的另一个问题here。 le_id在le_id列中应该是唯一的。