Question

我有一个数据框，其中列出了MS Office程序以及它们在我们家附近的各种计算机上崩溃的数据：

Services-Impacted  Date
MS Word            2013-03-01
MS Excel           2013-03-03
MS Powerpoint      2014-01-01
Excel,ppt,word     2014-05-04
MS Word            2015-03-01
MS Excel           2015-03-03
MS Powerpoint      2015-01-01

我希望按行搜索数据框，如果可以找到“MS Excel”或“excel”，则将值1分配给名为MS Word的新列，如果未找到，则为0。所以我希望得到这样的最终结果：

Services-Impacted  Date        MS Word MS Excel MS Powerpoint
MS Word            2013-03-01  1       0        0
MS Excel           2013-03-03  0       1        1
MS Powerpoint      2014-01-01  0       0        1
Excel,ppt,word     2014-05-04  1       1        1
MS Word            2015-03-01  1       0        0
MS Excel           2015-03-03  0       1        1
MS Powerpoint      2015-01-01  0       0        0

我看了很多不同的方法：

"MS Word" %in% Office$Services-Impacted[1]
TRUE

count the number of rows
i <-nrow(Office)
i

loop for the number of rows
for(i in 1:i)
    {
      # diff the time and print it out
      "MS Word " %in% Office$Services-Impacted[i]

    }

第一行运行良好但无法计算如何迭代整个数据框，因为使用[i]的lopping不会返回TRUE或FALSE列表也无法弄清楚如何使用通配符搜索我必须对每次搜索进行硬编码。

我还研究了一些选项，比如grep和filter，但是这些只过滤表而不是给我一个用1或0填充产品列的机制。

提前感谢您的回复乔纳森

Answer 1

我们可以在分割“受影响的服务”之后使用mtabulate中的qdapTools柱

library(qdapTools)
d1 <- mtabulate(strsplit(as.character(df1[,'Services-Impacted']), ','))
i1 <- grep("(?i)(e)xcel", names(d1))
i2 <- grep("Power|ppt$", names(d1))
cbind(df1, +(data.frame(MSWord = d1[,5], MSExcel = rowSums(d1[i1]), 
                MSPowerpoint = rowSums(d1[i2]))!=0))

R - 在数据框中搜索文本并将计数分配给单独的列

1 个答案: