我有两个带有时间数据的单独数据帧。我想将每个广告日期的付款总额相加。这是示例数据集:
Name Ad Date Ad Number --------------------------------- Michael 4/08/2018 1 Tony 4/08/2018 1 Alex 4/08/2018 1 Alex 6/08/2018 2 Vanessa 9/08/2018 1 Name Date Payments -------------------------------------- Michael 4/08/2018 100 Tony 4/08/2018 200 Alex 4/06/2018 300 Alex 6/06/2018 400 Alex 6/07/2018 400 Vanessa 9/08/2018 500
这是所需的输出:
Name Ad Number Payments ------------------------------------ Michael 1 100 Tony 1 200 Alex 1 300 Alex 2 800 Vanessa 1 500
因此,如果您看一下Alex,则总共为2个广告支付了3笔款项。我想使用广告的日期范围来汇总付款。
答案 0 :(得分:0)
虽然这可能不是最可靠的解决方案,但这是我想出的...
#Merge two lists
df_new = df.merge(df1,how='outer',on='Name')
#logic to make sure the sum is between the current date and after the previous date
count = 0
temp_list = []
while count != len(df_new):
names = df_new.Name[count]
find = df_new[['Name','Ad Number','Ad Date_x','Ad Date_y','Payments']] [(df_new['Name'] == names)]
find['new_col'] = np.where(find['Ad Date_y']<=find['Ad Date_x'], 'yes', 'no')
x = find.sort_values('Ad Date_y')
x = x[x.new_col == 'yes']
x = x.drop_duplicates('Ad Date_y')
x = x.groupby(['Name','Ad Number','Ad Date_x'],as_index=False)['Payments'].sum()
intermediate_list = x.values.tolist()
temp_list.append(intermediate_list)
count += 1
#Traversing through list of lists and appending to final list to make another df
final_list = []
for i in temp_list:
for j in i:
final_list.append(j)
#create final df and drop duplicates and drop the Ad_Date Columns
final_df = pd.DataFrame(final_list, columns = ['Name','Ad_Number','Ad_Date','Payments'])
final_df = final_df.drop_duplicates(['Name','Ad_Number','Ad_Date','Payments'])
final_df = final_df.drop('Ad_Date',axis=1)
print(final_df)
#RESULT
Name Ad_Number Payments
Michael 1 100
Tony 1 200
Alex 1 300
Alex 2 800
Vanessa 1 500