比较未对齐的系列对象

时间:2018-09-10 19:32:45

标签: python pandas

我有一些带有时间戳的财务数据,如下所示:

样本数据:

  transaction_type   transaction_announced_date   transaction_size_USDmm   target_company_name  
 ------------------ ---------------------------- ------------------------ --------------------- 
  B                  11/12/2017                   8000                     Company A            
  A                  4/19/2017                    NULL                     Company A            
  A                  2/12/2016                    200                      Company A            
  A                  5/24/2016                    NULL                     Company A            
  A                  6/1/2016                     3500                     Company A            
  B                  7/7/2016                     NULL                     Company A            
  A                  9/22/2016                    30                       Company A            
  A                  12/4/2014                    2800                     Company A            
  A                  1/16/2015                    1691                     Company B            
  A                  3/22/2015                    NULL                     Company B            
  B                  7/31/2015                    1000                     Company C            
  A                  8/19/2015                    NULL                     Company C            
  A                  8/25/2015                    NULL                     Company C            

对于拥有交易B的公司,我想查找该公司先前交易A的总和(基于宣布的日期),并将该值添加到名为“ sum_prior_trans_A”的新列中。

预期结果:

  transaction_type   transaction_announced_date   transaction_size_USDmm   target_company_name   sum_prior_trans_A  
 ------------------ ---------------------------- ------------------------ --------------------- ------------------- 
  B                  11/12/2017                   8000                     Company A             6530               
  B                  7/7/2016                     NULL                     Company A             2830               
  B                  7/31/2015                    1000                     Company C             NaN                

当前方法:

#input dataframe
trans_data

#add a new column that is the sum of all prior transactions A. 
#Will later drop all transactions A rows to be only left with transactions B as desired.  
trans_data['sum_previous_private_placements'] = trans_data.groupby(['target_company_name', 'transaction_type', 'transaction_announced_date']).filter(lambda row: (trans_data['target_company_name'] == row['target_company_name']) & (trans_data['transaction_announced_date'] == row['transaction_announced_date']) & (trans_data['transaction_type'] == 'A'))['transaction_size_USDmm'].sum()

我收到以下错误:
ValueError:只能比较标记相同的Series对象

如何找到每行(公司)的先前交易A的总和,然后将该值添加到名为“ sum_prior_trans_A”的新列中,而不会遇到未对齐的Series对象错误?

1 个答案:

答案 0 :(得分:0)

想出了一种方法。我相信还有更有效的方法。

#df of companies that have had transaction B
companies_with_trans_B = trans_data[trans_data['transaction_type'] == 'Merger/Acquisition']
companies_with_trans_B.reset_index(drop=True, inplace=True)  

#method for adding transaction A amounts for a given company and till a given date
def sum_previous_private_placements(df1, company_name, announced_date):
    return df1[(df1['target_company_name'] == company_name) & (df1['transaction_type'] == 'A') & (df1['transaction_announced_date'] <= announced_date)]['transaction_size_USDmm'].sum()  

#loop through companies_with_trans_B and call sum_previous_private_placements()
for i in companies_with_trans_B.index:
    companies_with_trans_B.loc[i, 'sum_previous_private_placements'] = sum_previous_private_placements(trans_data,companies_with_trans_B.loc[i,'target_company_name'], companies_with_trans_B.loc[i, 'transaction_announced_date'])