Question

当前正在努力更好地了解dplyr和整个tidyverse，现在我偶然发现了存储mutate调用结果的多种方法。我想知道添加额外列的一种可能的方法是好还是坏。

library(data.table)
library(dplyr)
dt <- structure(list(obs = c("1953M04", "1953M05", "1953M06", "1953M07", "1953M08", "1953M09", "1953M10", "1953M11", "1953M12", "1954M01")
               , gs1 = c(2.35999989509583, 2.48000001907349, 2.45000004768372, 2.38000011444092, 2.27999997138977, 2.20000004768372, 1.78999996185303, 
           1.66999995708466, 1.6599999666214, 1.4099999666214)), row.names = c(NA, -10L), class = c("data.table", "data.frame"))

# Data.Table approach
dt[, Date.Month := as.Date(paste0(obs,"-01"), format = "%YM%m-%d")]

# dplyr-way in a logic way at the end of the pipe
dt %>% mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d")) %>% {. ->> dt }

# Direct reassignment, but it's kind of illogic to assign on the left the output from the right, at least in my head ;-)
dt <- dt %>% mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d"))

在最新版本中进行重新分配是否需要花费大量的计算资源？

Answer 1

一个选项是%<>%中的复合赋值运算符（magrittr）

library(magrittr)
library(dplyr)
dt %<>% 
    mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d"))

但是，data.table赋值运算符（:=）将更快，更有效

保存`mutate（）`的结果而无需重新分配

1 个答案: