考虑一个data.table
dt:
library(data.table)
dt = setDT(structure(list(grp = c("a", "a", "b", "b", "b", "c", "c"),
yr = c(2000, 2012, 2004, 2008, 2014, 2008, 2016),
sal = c(20000, 240000, 30000,100000,120000, 15000, 60000)),
.Names = c("grp", "yr", "sal"),
row.names = c(NA,-7L), class = c("data.table", "data.frame")))
我有一个伪函数tag
,该函数根据sal
和yr
上的某些条件返回一个字符值。
tag = function(x){if(x$yr<2010 & x$sal<25000) {return(list(comment="okay"))}
else if(x$yr<2010 & x$sal>=25000) {return(list(comment="cool"))}
else if(x$yr>=2010 & x$sal<100000){return(list(comment="okay"))}
else if(x$yr>=2010 & x$sal>=100000){return(list(comment="cool"))} }
该函数返回的所有值都包含在list()
调用中,以便可以将返回的值分配给表mycomment
中的新列dt
。但是,以下两个调用的行为有所不同。
dt[,mycomment:=tag(.SD),by=1:nrow(dt)]
#mycomment appears as a character vector
dt[,`:=`(mycomment=tag(.SD)),by=1:nrow(dt)]
#mycomment appears as a list
在这种情况下,:=
运算符的行为不同的原因是什么?
答案 0 :(得分:3)
The function call for j
in x[i, j, ...]
when making an assignment to x
is
`:=`(col1_name = col1, col2_name = col2)
# or
c("col1_name", "col2_name") := list(col1, col2)
The second way exists for user convenience (so you don't have to mess with backticks around :=
). A further convenience is offered when there is a single column:
`:=`(col1_name = col1)
# or
col1_name := list(col1)
# or
col1_name := col1
Here, the final option saves you from having to wrap in list(...)
. The same convenience feature shows up when by=
is present. In both cases, the expectation is that j
evaluates to a list of columns, which is why a bare vector is also treated as a length-one list of columns. If you want to avoid reckoning with this inconsistency, you could always write list(...)
or always use the `:=`(...)
in j
.
In your example, this might mean changing your function to return a single column instead of wrapping in list(...)
. For some other ideas and references to the vignettes included with the package, maybe see Adding list columns to data tables in R returns inconsistent output - feature or bug?
Alternately, you could apply the tag
rule more efficiently with something like a "non-equi join":
mDT = data.table(
yr_up = c(2010, 2010, Inf, Inf),
sal_up = c(25000, Inf, 100000, Inf),
value = c("okay", "cool", "okay", "cool")
)
dt[, cmt := mDT[.SD, on=.(yr_up > yr, sal_up > sal), mult="first"]$value]