如何获得unigram和trigram?

时间:2016-05-24 06:08:08

标签: r data-analysis

我需要获得没有bigrame的unigrame和trigram

trigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 1, max = 3))

如何编辑此代码以获得答案

1 个答案:

答案 0 :(得分:0)

一种方法是使用dfm包中的quanteda函数,如下所示,

library(quanteda)
dfm('I only want uni and trigrams', ngrams = c(1,3), verbose = FALSE)

#Document-feature matrix of: 1 document, 10 features.
#1 x 10 sparse Matrix of class "dfmSparse"
#       features
#docs    i only want uni and trigrams i_only_want only_want_uni want_uni_and uni_and_trigrams
#  text1 1    1    1   1   1        1           1             1            1                1