合并最近的追溯时间戳并向前填充熊猫

时间:2021-03-16 02:34:55

标签: python pandas dataframe merge timestamp

我很难掌握像 merge_asof() 这样的 Pandas 特殊合并函数。

我有两个数据帧:coords - 来自 EV gps 的 ping,以及 info - 其他 EV 属性,例如导航目的地和电池电量。我的目标是合并它们,使输出数据帧行号等于两个数据帧行数的总和。例如:

coords.shape
(10, 3)

coords

ts                          lat       lng
2021-01-02 16:08:24.067971  58.3019 -134.4197
2021-01-06 12:54:18.535681  58.3021 -134.4195
2021-01-08 22:15:35.036423  58.3025 -134.4195
2021-01-16 01:10:39.610540  58.3029 -134.4193
2021-01-27 12:28:45.202376  58.3030 -134.4197
2021-01-30 05:32:09.404525  58.3031 -134.4190
2021-02-08 10:39:19.686159  58.3033 -134.4187
2021-02-15 01:30:16.733921  58.3039 -134.4187
2021-02-16 12:49:55.366025  58.3040 -134.4185
2021-02-19 23:57:57.369978  58.3041 -134.4181


info.shape
(3, 3)

info

ts                          nav_to  battery
2021-01-26 12:47:52.972586  Juneau      90
2021-02-14 23:23:18.186058  Anchorage   50
2021-02-19 07:26:35.357977  Fairbanks   30

infocoord 应该合并,这样时间戳 ts 是连续的,并且 info 行应该与 {{1} 中的行匹配使用“之前”最近的时间戳。最后,coordsnav_tobatterylat 应该向前填充。以上示例的输出将是:

lng

我尝试过 output ts lat lng nav_to battery 2021-01-02 16:08:24.067971 58.3019 -134.4197 None NaN 2021-01-06 12:54:18.535681 58.3021 -134.4195 None NaN 2021-01-08 22:15:35.036423 58.3025 -134.4195 None NaN 2021-01-16 01:10:39.610540 58.3029 -134.4193 None NaN 2021-01-26 12:47:52.972586 58.3029 -134.4193 Juneau 90.0 2021-01-27 12:28:45.202376 58.3030 -134.4197 Juneau 90.0 2021-01-30 05:32:09.404525 58.3031 -134.4190 Juneau 90.0 2021-02-08 10:39:19.686159 58.3033 -134.4187 Juneau 90.0 2021-02-14 23:23:18.186058 58.3033 -134.4187 Anchorage 50.0 2021-02-15 01:30:16.733921 58.3039 -134.4187 Anchorage 50.0 2021-02-16 12:49:55.366025 58.3040 -134.4185 Anchorage 50.0 2021-02-19 07:26:35.357977 58.3040 -134.4185 Fairbanks 30.0 2021-02-19 23:57:57.369978 58.3041 -134.4181 Fairbanks 30.0 但这不会产生正确的结果,它向后填充并且只保留来自 pd.merge_asof(coords, info, on="ts", direction="forward") 的记录。在 coords 中产生所需结果的正确命令是什么?

1 个答案:

答案 0 :(得分:2)

尝试使用默认的 direction='backward',然后使用第二个数据框 concat

(pd.concat([pd.merge_asof(df1, df2, on='ts'), df2])
   .sort_values('ts')
)

输出:

                          ts      lat       lng     nav_to  battery
0 2021-01-02 16:08:24.067971  58.3019 -134.4197        NaN      NaN
1 2021-01-06 12:54:18.535681  58.3021 -134.4195        NaN      NaN
2 2021-01-08 22:15:35.036423  58.3025 -134.4195        NaN      NaN
3 2021-01-16 01:10:39.610540  58.3029 -134.4193        NaN      NaN
0 2021-01-26 12:47:52.972586      NaN       NaN     Juneau     90.0
4 2021-01-27 12:28:45.202376  58.3030 -134.4197     Juneau     90.0
5 2021-01-30 05:32:09.404525  58.3031 -134.4190     Juneau     90.0
6 2021-02-08 10:39:19.686159  58.3033 -134.4187     Juneau     90.0
1 2021-02-14 23:23:18.186058      NaN       NaN  Anchorage     50.0
7 2021-02-15 01:30:16.733921  58.3039 -134.4187  Anchorage     50.0
8 2021-02-16 12:49:55.366025  58.3040 -134.4185  Anchorage     50.0
2 2021-02-19 07:26:35.357977      NaN       NaN  Fairbanks     30.0
9 2021-02-19 23:57:57.369978  58.3041 -134.4181  Fairbanks     30.0

然后您可以选择bfill latlng 列。或者你可以只merge_asof两次:

(pd.concat([pd.merge_asof(df1, df2, on='ts'), 
            pd.merge_asof(df2, df1, on='ts')
           ])
   .sort_values('ts')
)

输出:

                          ts      lat       lng     nav_to  battery
0 2021-01-02 16:08:24.067971  58.3019 -134.4197        NaN      NaN
1 2021-01-06 12:54:18.535681  58.3021 -134.4195        NaN      NaN
2 2021-01-08 22:15:35.036423  58.3025 -134.4195        NaN      NaN
3 2021-01-16 01:10:39.610540  58.3029 -134.4193        NaN      NaN
0 2021-01-26 12:47:52.972586  58.3029 -134.4193     Juneau     90.0
4 2021-01-27 12:28:45.202376  58.3030 -134.4197     Juneau     90.0
5 2021-01-30 05:32:09.404525  58.3031 -134.4190     Juneau     90.0
6 2021-02-08 10:39:19.686159  58.3033 -134.4187     Juneau     90.0
1 2021-02-14 23:23:18.186058  58.3033 -134.4187  Anchorage     50.0
7 2021-02-15 01:30:16.733921  58.3039 -134.4187  Anchorage     50.0
8 2021-02-16 12:49:55.366025  58.3040 -134.4185  Anchorage     50.0
2 2021-02-19 07:26:35.357977  58.3040 -134.4185  Fairbanks     30.0
9 2021-02-19 23:57:57.369978  58.3041 -134.4181  Fairbanks     30.0