如何检查用空格分隔的字符串是否在R中的dataframe列中

时间:2017-04-14 03:45:02

标签: r

我在R

中有以下数据框
Client_ID   IT    FMCG   Consumer   Oil_Gas    Finance
  ABC       0     2345      0       4768.90      0
  CFG       234    0        0       2366.54      0
  DEF       1234   0        345     523          2344

现在,我想要的是打印每个客户拥有的行业数量(均为非零值)。我可以通过跟随R。

来做到这一点
 df$portfolio_holdings <- simplyfy2array(apply(df[2:6],1,function(x) paste(names(df[2:6])[x!=0],collapse=" ")))

这给了我以下输出。

Client_ID   IT    FMCG   Consumer   Oil_Gas    Finance   portfolio_holdings
  ABC       0     2345      0       4768.90      0        FMCG Oil_Gas
  CFG       234    0        0       2366.54      0        IT Oil_Gas
  DEF       1234   0        345     523          2344     IT Consumer Oil_Gas Finance

我有另一个数据框,其中包含以下列

  Sectors     Scrip      Target_Price    Call
   FMCG        WER          345           Buy
   IT          CFHG         134           Sell
   Oil_Gas     ERTY         567           Buy
   Consumer    QWER         543           Buy
   Finance     QASD         334           Buy 

现在,我想要的是推荐客户随机3个以上的部门,而这些部门并没有在他的投资组合中持有。最终所需的数据框架将是。

Client_ID   IT   FMCG     Consumer    Oil_Gas    Finance  portfolio    Recommendation
 ABC         0   2345       0         4768.90     0       FMCG Oil_Gas 1:IT|GFHG|134|Sell||2:Consumer|QWER|543|Buy||3:Finance|QASD|334|Buy  
 CFG       234    0         0         2366.54     0       IT Oil_Gas    1:FMCG|WER|345|Buy||2:Consumer|QWER|543|Buy||3:Finance|QASD|334|Buy
 DEF       1234   0         345       523         2344    IT Consumer Oil_Gas Finance   1:FMCG|WER|345|Buy

我如何在R中实现这一目标?

示例数据框

client_id <- c('ABC','DEF','ERT')
IT <- c(0,234,1234)
FMCG <- c(2345,0,0)
Consumer <- c(0,0,345)
Oil_Gas <- c(4768,2366,523)
Finance <- c(0,0,2345)


Sectors <- c('FMCG','IT','Oil_Gas','Consumer','Finance')
Scrip <- c('ABC','DFG','ERT','QWE','VGB')
Target <- c(345,134,567,543,334)
call <- c('Buy','Sell','Buy','Buy','Buy')

recom <- data.frame(Sectors,Scrip,Target,call)

df <- data.frame(client_id,IT,FMCG,Consumer,Oil_Gas,Finance)

1 个答案:

答案 0 :(得分:2)

我没有测试过这段代码,因为我没有这里的data.frames,但你可以理解。

假设第二个数据帧名为df2:

  recomend=function(df1,df2){
    df1$Recommendation=NA
    for(i in 1:dim(df1)[1]){
      recm=which(!df2$Sectors%in%unlist(strsplit(df1$portfolio_holdings[i]," ")))
      recm=recm[sample(1:length(recm))[1:3]]
      nval=c()
      for(j in 1:length(recm)){
        nval=c(nval,paste(df2[recm[j],],collapse="|"))
      }
      df1$Recommendation[i]=paste(nval,collapse="||")
    }
    return(df1)
  }