在data.table中动态创建新列

时间:2017-09-16 20:06:22

标签: r data.table

我在R中有一个data.table,想要创建一个新列。假设我将日期列名称保存为变量,并希望在新列中将_year附加到该名称。我只需指定名称即可完成正常路由,但如何使用date_col变量创建新列名。

这是我尝试过的。我想要的最后两个不起作用。

dat = data.table(one = 1:5, two = 1:5, 
                 order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01",
                           "2015-04-01","2015-05-01"))
dat
date_col = "order_date"
dat[,`:=`(OrderDate_year = substr(get(date_col)[!is.na(get(date_col))],1,4))][]
dat[,`:=`(new = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]

2 个答案:

答案 0 :(得分:1)

set 函数很适合这样做。比在data.table中设置更快。这就是你追求的吗? http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/#fast-looping-with-set

library(data.table)
dat = data.table(one = 1:5, two = 1:5, 
                 order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01",
                           "2015-04-01","2015-05-01"))
dat
date_col = "order_date"

year_col <- paste0(date_col, "_year", sep="")
set(dat, j = year_col, value = substr(dat[[date_col]], 1, 4) )

答案 1 :(得分:1)

最后两个语句返回错误消息:

dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(paste0(date_col, "_year", sep="") ="
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) ="

调用:=()函数的正确语法是:

dat[, `:=`(paste0(date_col, "_year", sep = ""), 
           substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]
dat[, `:=`(noquote(paste0(date_col, "_year", sep = "")), 
           substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]

即,将=替换为,

但是,赋值语法和右侧太复杂了。

order_date列已经属于Date类:

str(dat)
Classes ‘data.table’ and 'data.frame':    5 obs. of  3 variables:
 $ one       : int  1 2 3 4 5
 $ two       : int  1 2 3 4 5
 $ order_date: Date, format: "2015-01-01" "2015-02-01" ...
 - attr(*, ".internal.selfref")=<externalptr>

为了提取年份,可以使用year()函数(来自data.table包或lubridate包中最后加载的任何内容),因此不会转换回字符和需要提取年份字符串:

date_col = "order_date"
dat[, paste0(date_col, "_year") := lapply(.SD, year), .SDcols = date_col][]
   one two order_date order_date_year
1:   1   1 2015-01-01            2015
2:   2   2 2015-02-01            2015
3:   3   3 2015-03-01            2015
4:   4   4 2015-04-01            2015
5:   5   5 2015-05-01            2015

或者,

dat[, paste0(date_col, "_year") := year(get(date_col))][]
dat[, `:=`(paste0(date_col, "_year"), year(get(date_col)))][]

也可以。