如何将两个数据帧df1和df2的两个列c1和c2分别合并到df1的列c1中? c1和c2是字典

时间:2019-12-29 05:22:23

标签: python pandas dataframe

我分别在数据场df1和df2中有两列C1和C2。我希望将C1和C2合并到df1(或df2)中,以便如果键相同,则字典中的值相加,如果键不存在,则将值附加到字典中。

two dataframes

我想要的输出是: 我想在每行的le_id级别聚合所有内容,run_seq ='latest'并合并两个字典,即如果键存在,则添加值,否则将添加到现有字典中。输出应为数据框。像这样的结果不完全相同

编辑: 创建此数据框的代码是:

import pandas  as pd
data = {'le_id' : [101]*4 + [102]*4 + [103]*3 + [104]*5 + [101],
       'run_seq' : [31]*11 + [32]*6,
       'cp_id' : [201, 201, 201, 201, 203, 204, 205, 205, 206, 208, 209, 202, 201, 204, 205, 208, 208],
       'cp_name' : ['A', 'A', 'A', 'A', 'B', 'C', 'E', 'E', 'F', 'G', 'H', 'B', 'A', 'D', 'E', 'H', 'H'],
       'products' : ['U', 'U', 'U', 'W', 'X', 'U', 'U', 'V', 'W', 'X','U', 'U', 'V', 'W', 'X', 'Z', 'U'],
       'tran_amnt' : [10203, 13789, 74378, 47833, 40237, 93732, 63738, 42563, 92822, 11276, 63633, 99292, 27892, 82727, 32442, 55622, 43535],
       'currency' : ['USD', 'YEN', 'USD', 'SGD', 'USD', 'INR', 'INR', 'SGD', 'USD', 'INR', 'SGD', 'SGD', 'SGD', 'SGD', 'INR', 'INR', 'INR']}
data_gb = data.groupby(['le_id', 'cp_id', 'run_seq', 'products', 'currency']).sum()
data_gb.reset_index(inplace = True)
data_gb

然后我执行此操作,结果是以上两个数据帧:

pd.set_option('display.max_columns', None)
from pprint import pprint
def add_current_counterparty_interaction(df):  
    df_this = df.copy()    
    current_interaction_dict = {}

    for cp_id in df_this.cp_id.values:
        prod_curr_dict = {}
        rows_to_filter = df_this.cp_id==cp_id
        for prod in df_this.loc[rows_to_filter, 'products'].values: #product repeating
            curr_amount_dict = {}
            rows_to_filter_prod = rows_to_filter & (df_this.products == prod)
            for curr in df_this.loc[rows_to_filter,'currency'].values:
                rows_to_filter_curr = rows_to_filter_prod & (df_this.currency == curr)
                if rows_to_filter_curr.any():
                    #print(df_this[rows_to_filter_curr].tran_amnt)
                    this_tx_amt = df_this[rows_to_filter_curr].tran_amnt.values
                    curr_amount_dict[curr] = this_tx_amt
            if len(curr_amount_dict):
                prod_curr_dict[prod] = curr_amount_dict
        if len(prod_curr_dict):
            current_interaction_dict[cp_id] = prod_curr_dict
    pprint(current_interaction_dict)
    return current_interaction_dict


def process_counterparty_journey(df, rs):
    df_this = df[df["run_seq"]==rs].copy()
    #add_current_counterparty_interaction(df_this, 101)
    df_op = df_this.groupby(["le_id", "run_seq"]).apply(lambda x: add_current_counterparty_interaction(x)).reset_index()
    #print(df.head())
    #print(df_this)
    #pprint(df_op[0].values[0])
    pprint(df_op)
    return df_op
a = process_counterparty_journey(data_gb, 31)
b = process_counterparty_journey(data_gb, 32)

结果数据框a和b是上面打印的两个数据框。 rs是run_seq,即31和32。

我想要的功能应该以a和b(数据帧)作为参数,并应该返回一个数据帧。看看我发布的另一个问题here。 le_id在le_id列中应该是唯一的。

0 个答案:

没有答案