由pivot_wider取代dplyr的价差,由pivot_longer取代dplyr的价差

时间:2019-11-25 10:47:28

标签: r dplyr

在dplyr的最新版本中,spread()gather()被标记为 lifecycle:retired pivot_wider()pivot_longer()

我的问题是新功能需要更多的键入操作,并且执行速度似乎较慢。我想知道我做错了什么。

示例数据:

library(tidyverse)

dates <- seq(from = as.Date("1975-01-01"), to = as.Date("2019-10-31"), by = "months")

returndata <- tibble(stock = sort(rep(letters, length(dates))),
                     month = rep(dates, length(letters)),
                     ret   = runif(length(dates) * length(letters)) - 0.5)

以前,我将数据分布如下:

returndata_spread <- returndata %>% 
  spread(stock, ret)

使用pivot_wider,我会这样做:

returndata_wider <- returndata %>% 
  pivot_wider(names_from = stock, values_from = ret)

结果完全一样。

要先收集:

returndata_gather <- returndata_wider %>% 
  gather(stock, ret, -month)

现在有了pivot_longer:

returndata_longer <- returndata_wider %>% 
  pivot_longer(-month, names_to = "stock", values_to = "ret") %>% 
  arrange(stock, month)

我测量执行时间并得到以下信息:

> t_spread
Time difference of 0.01287794 secs

> t_wider
Time difference of 0.4083362 secs

> t_gather
Time difference of 0.002280474 secs

> t_longer
Time difference of 0.01168776 secs

新功能要慢得多。

1 个答案:

答案 0 :(得分:1)

这似乎是Github上this问题的另一个实例,应该在tidyr的开发版本中修复。更新tidyr(即devtools::install_github("tidyverse/tidyr"))后,您的示例获得了可比的性能:

library(tidyverse)

dates <- seq(from = as.Date("1975-01-01"), to = as.Date("2019-10-31"), by = "months")

returndata <- tibble(stock = sort(rep(letters, length(dates))),
                     month = rep(dates, length(letters)),
                     ret   = runif(length(dates) * length(letters)) - 0.5)

bench::mark(
  spread = returndata %>% spread(stock, ret),
  pivot_wider = returndata %>% pivot_wider(names_from = stock, values_from = ret)
)
#> # A tibble: 2 x 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 spread        8.83ms   9.57ms     100.         0B     6.39
#> 2 pivot_wider  10.96ms  11.37ms      86.1        0B     4.42

reprex package(v0.3.0)于2019-11-25创建