Question

我有两个数据框。我想为第一个datafame中的每个记录在第二个中汇总一个“金额”列。

所以对于每个

df1.Date = sum(df2.amount WHERE df1.Date <= df2.Date AND df1.yearAgo >= df2.Date)

df1 = pd.DataFrame({'Date':['2018-10-31','2018-10-30','2018-10-29','2018-10-28'],'yearAgo':['2017-10-31','2017-10-30','2017-10-29','2017-10-28']})

df2 = pd.DataFrame({'Date':['2018-10-30','2018-7-30','2018-4-30','2018-1-30','2017-10-30'],'amount':[1.0,1.0,1.0,1.0,0.75]})

所需结果：

df1.Date     yearToDateTotalAmount
2018-10-31        3.0
2018-10-30        4.75
2018-10-29        3.75
2018-10-28        3.75

Answer 1

IIUC，您的预期输出应该在第一行中有4。

由于numpy和less_equal是greater_equal，因此可以使用outer比较的ufunc功能来非常有效地实现这一目标。

注意

>>> np.greater_equal.outer(df1.Date, df2.Date)

array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [False,  True,  True,  True,  True],
       [False,  True,  True,  True,  True]])

所以你可以戴上口罩

mask = np.greater_equal.outer(df1.Date, df2.Date) & 
       np.less_equal.outer(df1.yearAgo, df2.Date)

并使用outer multiplication +和axis=1求和

>>> np.sum(np.multiply(mask, df2.amount.values), axis=1)

Out[49]:
array([4.  , 4.75, 3.75, 3.75])

最后，只分配回来

>>> df1['yearToDateTotalAmount'] = np.sum(np.multiply(mask, df2.amount.values), axis=1)

    Date        yearAgo     yearToDateTotalAmount
0   2018-10-31  2017-10-31  4.00
1   2018-10-30  2017-10-30  4.75
2   2018-10-29  2017-10-29  3.75
3   2018-10-28  2017-10-28  3.75

另一个数据框的熊猫数据框总和日期范围

1 个答案: