R中总是在一起的词

时间:2019-04-16 15:48:57

标签: r text-mining word-frequency

我正在使用R,并且我的数据集中有一个文本列,我需要知道是否有任何方法可以知道什么词总是在一起。 像大多数两个单词或三个单词一样……等

例如:

Happy birthday to you 
Happy weekend 
Have a nice day
Be close 
Be smart 
Happy birthday 
It was a nice day
Happy birthday mama

所以结果应该是这样

Happy birthday  - freq 3 
Nice day - freq 2

1 个答案:

答案 0 :(得分:3)

似乎您需要创建二元语法并计算特征。这是处理StaleElementReferenceException: Message: stale element reference: element is not attached to the page document 的一种方法。

quanteda

它的作用是:

  1. 标记化
  2. 创建二元组(由单个空格连接)
  3. 创建文档将来矩阵(按 library(quanteda) text <- c("Happy birthday to you ", "Happy weekend ", "Have a nice day", "Be close ", "Be smart ", "Happy birthday ", "It was a nice day", "Happy birthday mama") text %>% tokens() %>% tokens_ngrams(n = 2, concatenator = " ") %>% dfm() %>% topfeatures() ## happy birthday a nice nice day birthday to to you be smart ## 3 2 2 1 1 1 ## happy weekend it was was a have a ## 1 1 1 1 的要求)
  4. 计算最常用的功能