Question

我的数据如下：

>df2
   id     calmonth       product
1 101       01           apple
2 102       01           apple&nokia&htc
3 103       01           htc
4 104       01           apple&htc
5 104       02           nokia

para=c('apple','htc','nokia')

我想获得产品包含apple&htc,apple&nokia等的ID的数量。我的功能如下：

xandy=function(a,b){
        ddply(df2,.(calmonth),summarise,
                              csum=length(grep(paste0('apple','.*','htc'),product)),
                              coproduct=paste0('apple','&','htc')
             )
                   }

此功能给我一个完美的结果如下：

> xandy(para[1],para[3])
  calmonth csum   coproduct
1       01    2   apple&htc
2       02    0   apple&htc

但我需要的不仅是apple&htc，还有apple&nokia等，所以我将apple and htc自己改为参数，这样的新功能可能是这样的：

xandy=function(a,b){
        ddply(df2,.(calmonth),summarise,
                              csum=length(grep(paste0(a,'.*',b),product)),
                              coproduct=paste0(a,'&',b)
             )
                   }

看到差异？ 我已将'apple'，'htc'更改为a，b（参数） 但它根本不是我想要的。

> xandy(para[1],para[3])

eval中的错误（expr，envir，enclos）：缺少参数，没有默认值另外：警告消息：在grep（paste0（a，“。*”，b），product）中：参数'pattern'的长度为＆gt; 1，只使用第一个元素

Answer 1

您问题的直接解决方案可能是：

ddply(df2, .(calmonth), summarise, 
               apple = as.numeric(length(product == "apple")),
               apple.nokia.htc = as.numeric(length(product == "apple&nokia&htc")),
               htc = as.numeric(length(product == "htc")),
               apple.htc = as.numeric(length(product == "apple&htc"))
)

Answer 2

在MengChen和其他人的帮助下，我得到了一个直截了当的答案。

xandy=function(a,b){
myStr_match=paste0(a,'.*',b)
myStr_match1=paste0(b,'.*',a)
ajoinb_match=paste0(a,'&',b)
ddply(df2,.(calmonth),function(data,myStr,myStr1,ajoinb){
summarise(data,
          csum=max(length(grep(myStr,product)),length(grep(myStr1,product))),
          coproduct=ajoinb)
  },myStr=myStr_match,myStr1=myStr_match1,ajoinb=ajoinb_match)
}

也许这不是最好的答案，但无论如何它确实有用。

R函数包含plyr - ddply（）：ddply（）中的参数无法正确过去

2 个答案: