比较两个单独数据帧的时间

时间:2018-11-07 23:06:25

标签: python pandas

我有两个带有时间数据的单独数据帧。我想将每个广告日期的付款总额相加。这是示例数据集:

Name        Ad Date     Ad Number      
---------------------------------
Michael    4/08/2018        1    
Tony       4/08/2018        1
Alex       4/08/2018        1
Alex       6/08/2018        2
Vanessa    9/08/2018        1

Name        Date         Payments  
--------------------------------------
Michael    4/08/2018      100
Tony       4/08/2018      200
Alex       4/06/2018      300
Alex       6/06/2018      400
Alex       6/07/2018      400
Vanessa    9/08/2018      500

这是所需的输出:


Name        Ad Number     Payments    
------------------------------------
Michael        1           100
Tony           1           200 
Alex           1           300
Alex           2           800
Vanessa        1           500

因此,如果您看一下Alex,则总共为2个广告支付了3笔款项。我想使用广告的日期范围来汇总付款。

1 个答案:

答案 0 :(得分:0)

虽然这可能不是最可靠的解决方案,但这是我想出的...

#Merge two lists
df_new = df.merge(df1,how='outer',on='Name')

#logic to make sure the sum is between the current date and after the previous date
count = 0
temp_list = []
while count != len(df_new): 
  names = df_new.Name[count]
  find = df_new[['Name','Ad Number','Ad Date_x','Ad Date_y','Payments']] [(df_new['Name'] == names)]
  find['new_col'] = np.where(find['Ad Date_y']<=find['Ad Date_x'], 'yes', 'no')
  x = find.sort_values('Ad Date_y')
  x = x[x.new_col == 'yes']
  x = x.drop_duplicates('Ad Date_y')
  x = x.groupby(['Name','Ad Number','Ad Date_x'],as_index=False)['Payments'].sum()
  intermediate_list = x.values.tolist()
  temp_list.append(intermediate_list)
  count += 1

#Traversing through list of lists and appending to final list to make another df
final_list = []
for i in temp_list:
  for j in i:
    final_list.append(j)

#create final df and drop duplicates and drop the Ad_Date Columns
final_df = pd.DataFrame(final_list, columns = ['Name','Ad_Number','Ad_Date','Payments'])
final_df = final_df.drop_duplicates(['Name','Ad_Number','Ad_Date','Payments'])
final_df = final_df.drop('Ad_Date',axis=1)

print(final_df)

#RESULT

Name      Ad_Number  Payments
Michael           1       100
Tony              1       200
Alex              1       300
Alex              2       800
Vanessa           1       500