给出一个熊猫数据框df
date numeric_field
0 2019-06-06 n
1 2019-06-07 n
2 2019-06-08 n
我想对日期进行自联接并将函数应用于numerical_field
。
在SQL中,我会做类似的事情:
SELECT a.date,
FUN(b.numeric_field))) new_field
FROM df a
INNER JOIN df b
ON b.date < a.date
WHERE a.date > '2019-06-06'
AND a.date <= '2019-06-08'
AND b.date >= '2019-06-06'
GROUP BY a.date
结果应该是:
date new_field
0 2019-06-07 fun(n)
1 2019-06-08 fun(n)
答案 0 :(得分:1)
我能想到的最快的方法是交叉合并和查询:
df = df.assign(dummy=1)
(df.merge(df, on='dummy', suffixes=['','_r'])
.query('"2019-06-06"<=date_r<date<="2019-06-08"') # filter
.assign(new_field=lambda x: x['numeric_field'].sum()) # replace the function in `lambda`
[['date','new_field']]
)
输出:
date new_field
3 2019-06-07 nnn
6 2019-06-08 nnn
7 2019-06-08 nnn