我正在使用R,并且我的数据集中有一个文本列,我需要知道是否有任何方法可以知道什么词总是在一起。 像大多数两个单词或三个单词一样……等
例如:
Happy birthday to you
Happy weekend
Have a nice day
Be close
Be smart
Happy birthday
It was a nice day
Happy birthday mama
所以结果应该是这样
Happy birthday - freq 3
Nice day - freq 2
答案 0 :(得分:3)
似乎您需要创建二元语法并计算特征。这是处理StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
的一种方法。
quanteda
它的作用是:
library(quanteda)
text <- c("Happy birthday to you ", "Happy weekend ", "Have a nice day",
"Be close ", "Be smart ", "Happy birthday ", "It was a nice day",
"Happy birthday mama")
text %>% tokens() %>%
tokens_ngrams(n = 2, concatenator = " ") %>% dfm() %>% topfeatures()
## happy birthday a nice nice day birthday to to you be smart
## 3 2 2 1 1 1
## happy weekend it was was a have a
## 1 1 1 1
的要求)