我创建了一个将清除文本然后执行unigram,bigram和trigram的函数,但出现此错误:
错误(函数(...,row.names = NULL,check.rows = FALSE, check.names = TRUE ,:参数暗示不同的行数: 30,19,10
该函数可用于某些列,而另一些则无法使用,但出现此错误,我不知道为什么。
这是我的代码:
why <- function(L) {
L <- removePunctuation(L)
L <- gsub("^[[:space:]]*","",L)
unigram <- L %>%
tokens() %>%
tokens_ngrams(n = 1, concatenator = " ") %>%
dfm() %>%
topfeatures(30)
df1 <- data.frame(word_unigram = names(unigram), count_unigram = unigram)
rownames(df1) <- NULL
bigram <- L %>%
tokens() %>%
tokens_ngrams(n = 2, concatenator = " ") %>%
dfm() %>%
topfeatures(30)
df2 <- data.frame(word_bigram = names(bigram), count_bigram = bigram)
rownames(df2) <- NULL
trigram <-L %>%
tokens() %>%
tokens_ngrams(n = 3, concatenator = " ") %>%
dfm() %>%
topfeatures(30)
df3 <- data.frame(word_trigram = names(trigram), count_trigram = trigram)
rownames(df3) <- NULL
return(list(df1, df2, df3))
}
datafinal <- data.frame(lapply(data[16:21], function (L) why(L)))
即使我通过data[16:21]
中的一列也不起作用。有帮助吗?
以下是一些示例数据:
N O P Q R S
yes no no no happy birthday
I am happy hello friends I am student yes