我有一个包含以下列的数据框:
DataFrame[timestamp: string, city_id: string, item_id: string, target_value: double, date: date, datestr: string, city_id: string, holiday_name: string, holiday_date: date, reference_date_id: date, hour_of_day: int]
我想创建一个名为ref_val的新列,该列具有来自具有相同city_id,hexcluster_id的另一行的值,但具有来自当前行的日期和小时的组合。此参考值应与相同city_id,hexcluster_id的目标值具有相同的值,但日期应与ref_date组合相同
例如:
+-------------------+-------+--------------------+------------+----------+----------+-------+--------------------+------------+-----------------+-----------+-----------+
| timestamp|city_id| item_id|target_value| date| datestr|city_id| holiday_name|holiday_date|reference_date_id|hour_of_day|day_of_week|ref_val|
+-------------------+-------+--------------------+------------+----------+----------+-------+--------------------+------------+-----------------+-----------+-----------+
|2018-10-07 11:00:00| 10|0df9c29d-8776-436...| 92.0|2018-10-07|2018-10-07| 10|Columbus Day(shou...| 2018-10-07| 2017-10-08| 11| Sun| 2
|2018-10-07 11:00:00| 10|0df9c29d-8776-436...| 92.0|2018-10-07|2018-10-07| 10|Columbus Day(shou...| 2018-10-07| 2017-10-08| 11| Sun| 92