Question

这个thread讨论过为数据框做这件事。我想做一点比这更复杂的事情：

dt <- data.table(A = c(rep("a", 3), rep("b", 4), rep("c", 5)) , B = rnorm(12, 5, 2))
dt2 <- dt[order(dt$A, dt$B)] # Sorting
# Always shows the factor from A
do.call(rbind, by(
  dt2, dt2$A,
  function(x) data.table(A = x[,A][1], B = x[,B][4])
              )
        )
#This is to reply to Vlo's comment below. If I do this, it will return both row as 'NA'
    do.call(rbind,
        by(dt2, dt2$A, function(x) x[4])
      )
# Take the max value of B according to each factor A
do.call(rbind, by(dt2, dt2$A,
                  function(x) tail(x,1))
                  )
        )

使用data.table本机函数执行此操作的更有效方法是什么？

Answer 1

在data.table中，您可以将列称为dt范围内的变量。所以，你不需要$。也就是说，

dt2 = dt[order(A, B)] # no need for dt$

就足够了。如果您希望B中的每个组A的第4个元素：

dt2[, list(B=B[4L]), by=A]
#    A        B
# 1: a       NA
# 2: b 6.579446
# 3: c 6.378689

请参阅@ Vlo的第二个问题答案。

从您使用data.table的方式来看，您似乎没有经历过任何小插曲或谈话。您可以查看Introduction and the FAQ vignettes或tutorials from the homepage;特别是，Matt's @user2014 tutorial在其他人中间。

Answer 2

第一句话对我没有意义，这是第二个

# Take the max value of B according to each factor A
dt2[, list(B=max(B)), by=A]

更有效的方法来获取data.table中的每个第n个元素

2 个答案: