我有以下形状的数据框:
# firms
firm year
1 2000
1 2001
1 2002
2 1999
2 2000
# products
product firm year category
100 1 2000 a
101 1 2000 b
102 1 2001 a
103 1 2001 b
104 1 2001 c
105 1 2002 b
106 1 2002 b
107 1 2002 c
108 2 1999 a b
109 2 1999 c
我想在公司中添加一个列,其中包含一个整数值向量,对应于 a , b , c 给定年份的特定公司。
结果应为:
# firms
firm year vector
1 2000 1 1 0
1 2001 1 1 1
1 2002 0 2 1
2 1999 1 1 1
2 2000 0 0 0
注意:实际数据包含300多个类别。
数据:
firms <-
structure(list(firm = c(1L, 1L, 1L, 2L, 2L),
year = c(2000L, 2001L, 2002L, 1999L, 2000L)),
.Names = c("firm", "year"),
class = "data.frame", row.names = c(NA, -5L))
products <-
structure(list(product = 100:109, firm = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L),
year = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L, 2002L, 2002L, 1999L, 1999L),
category = c("a", "b", "a", "b", "c", "b", "b", "c", "a b", "c")),
.Names = c("product", "firm", "year", "category"),
class = "data.frame", row.names = c(NA, -10L))
答案 0 :(得分:1)
我们可以在执行dcast
join
library(data.table)
res <- dcast(setDT(products)[firms, on = c("firm", "year")], firm + year ~category, length)
res[, -ncol(res), with = FALSE]
# firm year a b c
#1: 1 2000 1 1 0
#2: 1 2001 1 1 1
#3: 1 2002 0 2 1
#4: 2 1999 1 0 1
#5: 2 2000 0 0 0
根据OP的帖子中的更新
library(splitstackshape)
res <- dcast(cSplit(products, "category", " ", "long")[firms,
on = c("firm", "year")], firm + year ~category, length)
res[, -ncol(res), with = FALSE]
# firm year a b c
#1: 1 2000 1 1 0
#2: 1 2001 1 1 1
#3: 1 2002 0 2 1
#4: 2 1999 1 1 1
#5: 2 2000 0 0 0