如何获得在列中重复多次的计数和单词字符串

时间:2019-09-11 17:04:58

标签: r

您好,我正在研究NPS和CSAT。我有一栏中充斥着客户评论,并试图找出他们的问题和问题的根本原因。

comments <- c("My order took too much time to deliver.",
 "Logistics is the main problem.",
 "Late time delivery.",
 "Why do you need additional time to deliver my product.",
"You need to streamline your process towards quick delivery")

输出应为: B栏(字)| C列(计数) 时间| 3 交货| 3 问题1

是否有代码片段可以实现这一目标?有什么建议吗?

1 个答案:

答案 0 :(得分:0)

您可以使用tidytext软件包-特别是unnest_tokens函数:

library(tidytext)
library(dplyr)
comments <- c("My order took too much time to deliver.",
              "Logistics is the main problem.",
              "Late time delivery.",
              "Why do you need additional time to deliver my product.",
              "You need to streamline your process towards quick delivery")
comment.df <- tibble(comment=comments)

tidy_comments <- unnest_tokens(comment.df, word, comment)

tidy_comments <- tidy_comments %>% count(word, sort=TRUE)