Question

我最近决定从data.frame转换为data.table，但似乎无法以相同的方式成功操纵data.table。

我如何使用ddply编写以下data.table代码来实现相同的输出

# generate the data.table
R> set.seed(42)
R> dt <- data.table(sample = rep(c('a','b','c'),times = 1, each = 5), 
      seq = sample(c('dd','ee','ff'),15,replace=T), 
      num = sample(1:100,15),
      letters = sample(letters,15))

R> dt
     sample seq num letters
 1:      a  ff  95       t
 2:      a  ff  97       u
 3:      a  dd  12       j
 4:      a  ff  47       p
 5:      a  ee  54       a
 6:      b  ee  86       r
 7:      b  ff  14       v
 8:      b  dd  92       d
 9:      b  ee  88       q
10:      b  ff   8       k
11:      c  ee  99       g
12:      c  ff  35       w
13:      c  ff  80       z
14:      c  dd  39       m
15:      c  ee  72       f

我将使用ddply使用的代码在新sample列和seq中的sum(num)和num列中包含一行letter包含每个子组中编号最大的字母：
示例：对于sample == 'a'和seq == 'ff'子组，字母为u，因为它有num == 97，因为它高于95和47

R> df_new <- ddply(dt, .(sample, seq), function(df){
  order_d <- order(df$num, decreasing = TRUE)
  df_new <- df [order_d[1],]
  df_new$num <- sum(df$num)
  return(df_new)
})

R> df_new
   sample seq num letters
1      a  dd  12       j
2      a  ee  54       a
3      a  ff 239       u
4      b  dd  92       d
5      b  ee 174       q
6      b  ff  22       v
7      c  dd  39       m
8      c  ee 171       g
9      c  ff 115       z

我该如何data.table来做到这一点？

在更大的data.table中处理分组的data.tables

0 个答案: