当前正在努力更好地了解dplyr
和整个tidyverse
,现在我偶然发现了存储mutate
调用结果的多种方法。我想知道添加额外列的一种可能的方法是好还是坏。
library(data.table)
library(dplyr)
dt <- structure(list(obs = c("1953M04", "1953M05", "1953M06", "1953M07", "1953M08", "1953M09", "1953M10", "1953M11", "1953M12", "1954M01")
, gs1 = c(2.35999989509583, 2.48000001907349, 2.45000004768372, 2.38000011444092, 2.27999997138977, 2.20000004768372, 1.78999996185303,
1.66999995708466, 1.6599999666214, 1.4099999666214)), row.names = c(NA, -10L), class = c("data.table", "data.frame"))
# Data.Table approach
dt[, Date.Month := as.Date(paste0(obs,"-01"), format = "%YM%m-%d")]
# dplyr-way in a logic way at the end of the pipe
dt %>% mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d")) %>% {. ->> dt }
# Direct reassignment, but it's kind of illogic to assign on the left the output from the right, at least in my head ;-)
dt <- dt %>% mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d"))
在最新版本中进行重新分配是否需要花费大量的计算资源?
答案 0 :(得分:5)
一个选项是%<>%
中的复合赋值运算符(magrittr
)
library(magrittr)
library(dplyr)
dt %<>%
mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d"))
但是,data.table
赋值运算符(:=
)将更快,更有效