我有一些带有时间戳的财务数据,如下所示:
样本数据:
transaction_type transaction_announced_date transaction_size_USDmm target_company_name ------------------ ---------------------------- ------------------------ --------------------- B 11/12/2017 8000 Company A A 4/19/2017 NULL Company A A 2/12/2016 200 Company A A 5/24/2016 NULL Company A A 6/1/2016 3500 Company A B 7/7/2016 NULL Company A A 9/22/2016 30 Company A A 12/4/2014 2800 Company A A 1/16/2015 1691 Company B A 3/22/2015 NULL Company B B 7/31/2015 1000 Company C A 8/19/2015 NULL Company C A 8/25/2015 NULL Company C
对于拥有交易B的公司,我想查找该公司先前交易A的总和(基于宣布的日期),并将该值添加到名为“ sum_prior_trans_A”的新列中。
预期结果:
transaction_type transaction_announced_date transaction_size_USDmm target_company_name sum_prior_trans_A ------------------ ---------------------------- ------------------------ --------------------- ------------------- B 11/12/2017 8000 Company A 6530 B 7/7/2016 NULL Company A 2830 B 7/31/2015 1000 Company C NaN
当前方法:
#input dataframe
trans_data
#add a new column that is the sum of all prior transactions A.
#Will later drop all transactions A rows to be only left with transactions B as desired.
trans_data['sum_previous_private_placements'] = trans_data.groupby(['target_company_name', 'transaction_type', 'transaction_announced_date']).filter(lambda row: (trans_data['target_company_name'] == row['target_company_name']) & (trans_data['transaction_announced_date'] == row['transaction_announced_date']) & (trans_data['transaction_type'] == 'A'))['transaction_size_USDmm'].sum()
我收到以下错误:
ValueError:只能比较标记相同的Series对象
如何找到每行(公司)的先前交易A的总和,然后将该值添加到名为“ sum_prior_trans_A”的新列中,而不会遇到未对齐的Series对象错误?
答案 0 :(得分:0)
想出了一种方法。我相信还有更有效的方法。
#df of companies that have had transaction B
companies_with_trans_B = trans_data[trans_data['transaction_type'] == 'Merger/Acquisition']
companies_with_trans_B.reset_index(drop=True, inplace=True)
#method for adding transaction A amounts for a given company and till a given date
def sum_previous_private_placements(df1, company_name, announced_date):
return df1[(df1['target_company_name'] == company_name) & (df1['transaction_type'] == 'A') & (df1['transaction_announced_date'] <= announced_date)]['transaction_size_USDmm'].sum()
#loop through companies_with_trans_B and call sum_previous_private_placements()
for i in companies_with_trans_B.index:
companies_with_trans_B.loc[i, 'sum_previous_private_placements'] = sum_previous_private_placements(trans_data,companies_with_trans_B.loc[i,'target_company_name'], companies_with_trans_B.loc[i, 'transaction_announced_date'])