比较不同分辨率的熊猫时间戳

时间:2018-10-16 13:22:42

标签: python pandas dataframe time-series

我有两个时间序列数据帧(约45,000行对5行)。一个将时间戳记降低为毫秒,另一个将时间戳记为秒。我想在较大的数据框中创建一个新列,例如: a)将一个值附加到较大数据框中的行,该行的时间戳最接近(以秒为单位)(以秒为单位),该行与较小数据框中的时间戳最接近 b)其他时间戳记均为NaN。

larger df = 
            timestamp           price
0       2018-04-24 06:01:02.600 1
1       2018-04-24 06:01:02.600 1
2       2018-04-24 06:01:02.600 2
3       2018-04-24 06:01:02.600 4
4       2018-04-24 06:01:02.775 2
5       2018-04-24 06:01:02.825 3
6       2018-04-24 06:01:03.050 5
7       2018-04-24 06:01:03.125 6
8       2018-04-24 06:01:03.275 7
9       2018-04-24 06:01:03.300 4
10      2018-04-24 06:01:03.300 3
11      2018-04-24 06:01:03.950 5
12      2018-04-24 06:01:04.050 5


smaller df = 
   timestamp           price
0   24/04/2018 06:01:02 2
1   24/04/2018 12:33:37 4   
2   24/04/2018 14:29:34 5   
3   24/04/2018 15:02:50 6   
4   24/04/2018 15:20:04 7   

desired df =

            timestamp       price  newCol
0       2018-04-24 06:01:02.600 1   aValue
1       2018-04-24 06:01:02.600 1   NaN
2       2018-04-24 06:01:02.600 2   NaN
3       2018-04-24 06:01:02.600 4   NaN
4       2018-04-24 06:01:02.775 2   NaN
5       2018-04-24 06:01:02.825 3   NaN
6       2018-04-24 06:01:03.050 5   NaN
7       2018-04-24 06:01:03.125 6   NaN
8       2018-04-24 06:01:03.275 7   NaN
9       2018-04-24 06:01:03.300 4   NaN
10      2018-04-24 06:01:03.300 3   NaN
11      2018-04-24 06:01:03.950 5   NaN
12      2018-04-24 06:01:04.050 5   NaN

非常感谢您的帮助。对于一般编程人员来说,我还是太陌生,无法轻松解决此问题。

非常感谢

1 个答案:

答案 0 :(得分:1)

private rowClicked : any; onButtonClicked() { console.log(this.rowClicked) // undefined } onRowClicked(event: any)() { this.rowClicked = event.data; }

为了只使用一次值,我不得不从较小的数据框中跟踪时间戳。因此,当我将reindexreindex一起使用时,我会包含这些值。然后,我在遮罩内使用'nearest'

duplicated

pandas.merge_asof

  • 在小数据框中重命名df_small_new = df_small.set_index('timestamp', drop=False) df_small_new = df_small_new.reindex(df_large.timestamp, method='nearest') df_large.assign( newcol=df_small_new.price.mask(df_small_new.timestamp.duplicated()).values) timestamp price newcol 0 2018-04-24 06:01:02.600 1 2.0 1 2018-04-24 06:01:02.600 1 NaN 2 2018-04-24 06:01:02.600 2 NaN 3 2018-04-24 06:01:02.600 4 NaN 4 2018-04-24 06:01:02.775 2 NaN 5 2018-04-24 06:01:02.825 3 NaN 6 2018-04-24 06:01:03.050 5 NaN 7 2018-04-24 06:01:03.125 6 NaN 8 2018-04-24 06:01:03.275 7 NaN 9 2018-04-24 06:01:03.300 4 NaN 10 2018-04-24 06:01:03.300 3 NaN 11 2018-04-24 06:01:03.950 5 NaN 12 2018-04-24 06:01:04.050 5 NaN
  • 确保将'price'设置为direction
  • 这几乎可以回答问题

'nearest'