为什么处理时间戳如此之慢?

时间:2019-06-30 14:18:41

标签: r dplyr lubridate

考虑这个简单的例子

library(lubridate)
library(dplyr)

df1 <- tibble(timestamp = c(ymd_hms('2019-01-01 10:00.00.123'),
                     ymd_hms('2019-01-01 10:00.00.123'),
                     ymd_hms('2019-01-01 10:00.00.123'),
                     ymd_hms('2019-01-01 10:00.00.123')))


df2 <- tibble(timestamp = c(ymd_hms('2019-01-01 10:00.00.123'),
                            ymd_hms('2019-01-01 10:00.00.123'),
                            ymd_hms('2019-01-01 10:00.00.123'),
                            ymd_hms('2019-01-01 10:00.00.123'))) %>% 
  mutate(timestamp = as.numeric(timestamp))

如您所见,df1df2之间的唯一区别是时间戳记的表示形式。

不要看看时间上的疯狂差异

#first lets make them bigger. 400k rows is enough
df1 <- map_dfr(seq(1:100000), ~df1)
df2 <- map_dfr(seq(1:100000), ~df2)

现在简单的计算

> microbenchmark(
+   df2 %>% mutate(diff = timestamp - min(timestamp)),
+ times = 1000)
Unit: milliseconds
                                              expr      min       lq     mean   median
 df2 %>% mutate(diff = timestamp - min(timestamp)) 1.541533 2.182028 3.961685 2.327694
       uq     max neval
 2.567314 290.823  1000

同时

> microbenchmark(
+   df1 %>% mutate(diff = timestamp - min(timestamp)),
+ times = 1000)
Unit: milliseconds
                                              expr      min       lq    mean   median
 df1 %>% mutate(diff = timestamp - min(timestamp)) 4.111016 8.182359 13.1351 8.513956
       uq      max neval
 9.065631 378.1961  1000

轰!慢3倍以上。这是为什么? 谢谢!

0 个答案:

没有答案