Question

我有以下形状的数据框：

# firms
firm   year
1      2000
1      2001
1      2002
2      1999
2      2000

# products
product   firm   year   category
100       1      2000   a
101       1      2000   b
102       1      2001   a
103       1      2001   b
104       1      2001   c
105       1      2002   b
106       1      2002   b
107       1      2002   c
108       2      1999   a b
109       2      1999   c

我想在公司中添加一个列，其中包含一个整数值向量，对应于 a ， b ， c 给定年份的特定公司。

结果应为：

# firms firm year vector 1 2000 1 1 0 1 2001 1 1 1 1 2002 0 2 1 2 1999 1 1 1 2 2000 0 0 0

注意：实际数据包含300多个类别。

数据：

firms <- structure(list(firm = c(1L, 1L, 1L, 2L, 2L), year = c(2000L, 2001L, 2002L, 1999L, 2000L)), .Names = c("firm", "year"), class = "data.frame", row.names = c(NA, -5L)) products <- structure(list(product = 100:109, firm = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), year = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L, 2002L, 2002L, 1999L, 1999L), category = c("a", "b", "a", "b", "c", "b", "b", "c", "a b", "c")), .Names = c("product", "firm", "year", "category"), class = "data.frame", row.names = c(NA, -10L))

Answer 1

我们可以在执行dcast

后使用join

library(data.table)
res <- dcast(setDT(products)[firms, on = c("firm", "year")], firm + year ~category, length)
res[, -ncol(res), with = FALSE]
#   firm year a b c
#1:    1 2000 1 1 0
#2:    1 2001 1 1 1
#3:    1 2002 0 2 1
#4:    2 1999 1 0 1
#5:    2 2000 0 0 0

更新

根据OP的帖子中的更新

library(splitstackshape)
res <- dcast(cSplit(products, "category", " ", "long")[firms, 
     on = c("firm", "year")], firm + year ~category, length)
res[, -ncol(res), with = FALSE]
#   firm year a b c
#1:    1 2000 1 1 0
#2:    1 2001 1 1 1
#3:    1 2002 0 2 1
#4:    2 1999 1 1 1
#5:    2 2000 0 0 0

R - 根据满足另一个数据框中要求的行数创建新列

1 个答案:

更新