R - 根据满足另一个数据框中要求的行数创建新列

时间:2017-01-13 18:35:42

标签: r

我有以下形状的数据框:

# firms
firm   year
1      2000
1      2001
1      2002
2      1999
2      2000

# products
product   firm   year   category
100       1      2000   a
101       1      2000   b
102       1      2001   a
103       1      2001   b
104       1      2001   c
105       1      2002   b
106       1      2002   b
107       1      2002   c
108       2      1999   a b
109       2      1999   c

我想在公司中添加一个列,其中包含一个整数值向量,对应于 a b , c 给定年份的特定公司。

结果应为:

# firms
firm   year   vector
1      2000   1 1 0
1      2001   1 1 1
1      2002   0 2 1
2      1999   1 1 1
2      2000   0 0 0

注意:实际数据包含300多个类别。

数据:

firms <-
  structure(list(firm = c(1L, 1L, 1L, 2L, 2L),
                 year = c(2000L, 2001L, 2002L, 1999L, 2000L)),
            .Names = c("firm", "year"),
            class = "data.frame", row.names = c(NA, -5L))

products <-
  structure(list(product = 100:109, firm = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L),
                 year = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L, 2002L, 2002L, 1999L, 1999L),
                 category = c("a", "b", "a", "b", "c", "b", "b", "c", "a b", "c")),
            .Names = c("product", "firm", "year", "category"),
            class = "data.frame", row.names = c(NA, -10L))

1 个答案:

答案 0 :(得分:1)

我们可以在执行dcast

后使用join
library(data.table)
res <- dcast(setDT(products)[firms, on = c("firm", "year")], firm + year ~category, length)
res[, -ncol(res), with = FALSE]
#   firm year a b c
#1:    1 2000 1 1 0
#2:    1 2001 1 1 1
#3:    1 2002 0 2 1
#4:    2 1999 1 0 1
#5:    2 2000 0 0 0

更新

根据OP的帖子中的更新

library(splitstackshape)
res <- dcast(cSplit(products, "category", " ", "long")[firms, 
     on = c("firm", "year")], firm + year ~category, length)
res[, -ncol(res), with = FALSE]
#   firm year a b c
#1:    1 2000 1 1 0
#2:    1 2001 1 1 1
#3:    1 2002 0 2 1
#4:    2 1999 1 1 1
#5:    2 2000 0 0 0