我有一个看起来像这样的DataFrame(df1),其中每个都有一个存储和起始日期/结束日期:
df1 = pd.DataFrame(data={'store': ['X','Y','Z'], 'startdate': ['2020-02-03', '2020-03-05', '2020-04-01'], 'enddate': ['2020-03-05', '2020-05-02', '2020-06-07']})
df1
还有第二个DataFrame(df2),它看起来像这样,并具有不同商店的发票记录:
df2 = pd.DataFrame(data={'store': ['X','X','X','Y','Y'], 'invoicedate': ['2020-01-03','2020-02-05','2020-03-04', '2020-05-01', '2020-04-04'], 'sales': [153, 156, 12, 42, 48],})
df2
我想在第一个将df2 ['sales']求和的DataFrame(df1)中添加一列,我们可以将其称为df1 ['totalsales']。
df1['store']=df2['store']
df2['invoicedate'] >= df1['startdate'] **&** df2['invoicedate'] <= df1['enddate']
答案 0 :(得分:1)
df1.merge(df2, on='store').query('startdate <= invoicedate <= enddate')\
.groupby(['store', 'startdate', 'enddate'])[['sales']].sum()\
.reindex(pd.MultiIndex.from_frame(df1), fill_value=0)\
.reset_index()
输出:
store startdate enddate sales
0 X 2020-02-03 2020-03-05 168
1 Y 2020-03-05 2020-05-02 90
2 Z 2020-04-01 2020-06-07 0
IIUC,让我们使用merge
和query
来过滤结果,然后使用groupby
和sum
来过滤结果:
df1.merge(df2, on='store').query('startdate <= invoicedate <= enddate')\
.groupby(['store', 'startdate', 'enddate'])[['sales']].sum().reset_index()
输出:
store startdate enddate sales
0 X 2020-02-03 2020-03-05 168
1 Y 2020-03-05 2020-05-02 90
答案 1 :(得分:1)
如果要将所有行保留在df1
中,则可以使用以下方法:
def get_total_sales(x):
mask = df2.store == x.store
mask &= df2.invoicedate > x.startdate
mask &= df2.invoicedate <= x.enddate
x['total_sales'] = df2[mask].sales.sum()
return x
df1.apply(lambda x: get_total_sales(x), axis=1)
输出:
store startdate enddate total_sales
X 2020-02-03 2020-03-05 168
Y 2020-03-05 2020-05-02 90
Z 2020-04-01 2020-06-07 0