Question

我正在学习a blog post的data.table属性。我试图理解＆＃34;摘要表（简短和窄）＆＃34;下的部分，首先将data.frame（mtcars）强制转换为data.table：

> data <- as.data.table(mtcars)

> data <- data[,.(gear,cyl)]
> head(data)
    gear cyl
 1:    4   6
 2:    4   6
 3:    4   4
 4:    3   6
 5:    3   8
 6:    3   6

到目前为止，一切都很好。

现在我尝试了这个data[, gearsL := list(list(unique(gear))), by=cyl]

> head(data)
   gear cyl gearsL
1:    4   6  4,3,5
2:    4   6  4,3,5
3:    4   4  4,3,5
4:    3   6  4,3,5
5:    3   8    3,5
6:    3   6  4,3,5

我能够理解独特（齿轮）但无法理解列表（列表（唯一（齿轮））正在做什么。

Answer 1

data.table - 与任何data.frame一样 - 是列向量的指针列表。

创建新列时，我们编写j DT[i,j,by]，以便评估列列表：

DT[, (newcol_names) := list(newcol_A, newcol_B)]

对于单个list()列，这就是OP示例中最外层的list。

data[,gearsL := list(list(unique(gear))), by=cyl]

为清晰起见，可以而且应该使用别名.()编写：

data[, gearsL := .(list(unique(gear))), by=cyl]

这就是你需要知道的全部内容，但我在下面做了一些阐述。

详细信息。创建新列时，我们通常可以跳过list() / .()：

DT = data.table(id=1:3)
DT[, E := c(4,5,6)]
DT[, R := 3]
# this works as if we had typed
# R := c(3,3,3)

请注意E枚举每个值，而R在所有行上循环使用单个值。下一个例子：

DT[, Elist := list(hist(rpois(1,1)), hist(rpois(2,2)), hist(rpois(3,3)))]

正如我们对E所做的那样，我们在此列举了Elist的值。这仍然使用快捷方式; list()仅限于此列，因为该列本身为list，由

确认

sapply(DT, class)
#        id         E         R     Elist 
# "integer" "numeric" "numeric"    "list"

在一个特殊情况下，跳过list() / .()的便捷快捷方式失败了：当我们创建一个回收其值的list列时：

DT[, Rlist := list(c("a","b"))]
# based on the pattern for column R, this should work as if we typed 
# Rlist := list(c("a","b"), c("a","b"), c("a","b"))

它不起作用，因为解析器将其视为C2 := .( c("a", "b") )，并认为我们只是忽略了为每一行创建一个完整的枚举，如Elist所做的那样。要获得所需结果，请跳过快捷方式并将矢量包装在list() / .()中：

DT[, Rlist := .(list(c("a","b")))]

#    id E R       Elist Rlist
# 1:  1 4 3 <histogram>   a,b
# 2:  2 5 3 <histogram>   a,b
# 3:  3 6 3 <histogram>   a,b

在OP的示例中就是这种情况，其中外部list() / .()是必需的。

使用＆＃34; list＆＃34;在data.table的j参数中

1 个答案: