根据术语文档矩阵突出显示R列表中的单词

时间:2017-12-04 12:53:27

标签: r dplyr subset lookup

以下是广告系列数据的数据框

 Subject                  Response Rate(%)     Campaign Type    Channel
  Buy Stunning Phone A        81.00                   A         e-mail
 Special Emi OFFER             81.00                   B            e-mail
 Buy Stunning Phone at EMI     73.00                   C            SMS
The game changer is here.      85.00                   A            SMS
 Buy Stunnig Phone A           80.00                   A            SMS
 Special Emi OFFER             88.00                   B         e-mail
 Buy Stunning Phone at EMI     48.00                   C        e-mail
The game changer is here.      48.00                   A         e-mail
Buy Stunning Phone             89.00                   A         e-mail
 Special Emi OFFER             89.00                   B         SMS
 Buy Stunning Phone at EMI     69.00                   C         SMS

我创建了一个术语文档矩阵,如下所示

    Word    Frequency
     big    10
   upgrade  10
    worth   10
     latest 9
     much   9
    phone   8
 exciting   8
    back    7
  colours   7
    case    6
  stylish   6
   clear    6
experience  5
     time   5

我按照降低响应率的顺序对基于databy dplyr的通道类型进行了子集化。 我希望强调/列出针对每个主题的文件矩阵一词的字样。如果主题中存在Word,则列为主题附近的单独列表。我无法找到办法做到这一点。

1 个答案:

答案 0 :(得分:1)

你的意思是这样吗

library(dplyr)

df <- read.table(header = TRUE, sep = "," ,text = "Subject,Response Rate(%),Campaign Type,Channel
Buy Stunning Phone A,81.00,A,e-mail
Special Emi OFFER,81.00,B,e-mail
Buy Stunning Phone at EMI,73.00,C,SMS
The game changer is here.,85.00,A,SMS
Buy Stunnig Phone A,80.00,A,SMS
Special Emi OFFER,88.00,B,e-mail
Buy Stunning Phone at EMI,48.00,C,e-mail
The game changer is here.,48.00,A,e-mail
Buy Stunning Phone,89.00,A,e-mail
Special Emi OFFER,89.00,B,SMS
Buy Stunning Phone at EMI,69.00,C,SMS",)


df2 <- read.table(header = TRUE, sep = "," ,text = "Word,Frequency
big,10
upgrade,10
worth,10
latest,9
much,9
phone,8
exciting,8
back,7
colours,7
case,6
stylish,6
clear,6
experience,5
time,5",)

m = sapply(df2$Word %>% as.character() %>% trimws(),regexpr,text = df$Subject %>% as.character(),ignore.case = TRUE)

df$keyWord <- sapply(1:nrow(m),function(idx){
t = m[idx,] > 0 %>% unlist()
paste0(names(t)[t],collapse = ",")
})
df