Question

我有这个：

require(data.table)
items = list(c(1,1,3), c(2,2,4), c(3,4,5,6))
multiplier = c(10,20,30)
dt = data.table(items, multiplier)
#     items multiplier
#1:   1,1,3         10
#2:   2,2,4         20
#3: 3,4,5,6         30

我想要这个：

table(unlist(rep(items, multiplier)))
# 1  2  3  4  5  6 
#20 40 40 50 30 30

当项目向量很大时，这会有很差的表现是否可以在不使用rep的情况下执行此操作？

Answer 1

如果您不介意使用data.table而不是table对象，那么您可以这样做：

library(tidyr)
library(data.table)

unnest(dt, items)[, .(sum(multiplier)), items]
#   items V1
#1:     1 20
#2:     3 40
#3:     2 40
#4:     4 50
#5:     5 30
#6:     6 30

当然，您可以继续将结果重新整理为您需要的格式，例如使用dcast.data.table。

注意：对于微小的样本数据，table和rep的原始方法在我的机器上更快，但或许这种方法可以更好地扩展（？）。< / p>

Answer 2

使用tidyr和dplyr

library(dplyr)
library(tidyr)

dt %>% 
  unnest(items) %>% 
  group_by(items) %>% 
  summarise(sum = sum(multiplier)) %>% 
  arrange (items)

你得到：

Source: local data table [6 x 2]

  items sum
1     1  20
2     2  40
3     3  40
4     4  50
5     5  30
6     6  30

Answer 3

这最终有点圆，但没有复制itemsso可能会更快：

stbls <- lapply( apply(dt,1,I),function(rw) table(rw[['items']])*rw[['multiplier']])
> 
> str(stbls)
List of 3
 $ : table [1:2(1d)] 20 10
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:2] "1" "3"
 $ : table [1:2(1d)] 40 20
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:2] "2" "4"
 $ : table [1:4(1d)] 30 30 30 30
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:4] "3" "4" "5" "6"

> xtabs(Freq ~ Var1, data=do.call(rbind, lapply(stbls,as.data.frame))) 
Var1
 1  3  2  4  5  6 
20 40 40 50 30 30

起点是：

> apply(dt,1,I)
[[1]]
$items
[1] 1 1 3

$multiplier
[1] 10


[[2]]
$items
[1] 2 2 4

$multiplier
[1] 20


[[3]]
$items
[1] 3 4 5 6

$multiplier
[1] 30

Answer 4

data.tables是列式数据结构（如data.frames），当您以正确的格式存储数据（包括分组变量）时效果最佳。

require(data.table)
dt[, .(items = unlist(items), 
       mult  = rep.int(multiplier, vapply(items, 
                 length, 0L)))][, sum(mult), by=items]
#    items V1
# 1:     1 20
# 2:     3 40
# 3:     2 40
# 4:     4 50
# 5:     5 30
# 6:     6 30

将表应用于向量列表并将其聚合

4 个答案: