Question

我正在使用包含项目评论的数据集。代码运行完全适用于大多数通常有大约20-30个单词的评论，但只要发生只有一个单词的评论，代码就会抛出错误。

library(NLP)
library(openNLP)
library(stringr)

x <- NLP::as.String("pathetic")
wordAnnotation <- NLP::annotate(x, list(Maxent_Sent_Token_Annotator(), 
  Maxent_Word_Token_Annotator()))
POSAnnotation <- NLP::annotate(x, Maxent_POS_Tag_Annotator(), 
  wordAnnotation)
POSwords <- subset(POSAnnotation, type == "word")
tags <- sapply(POSwords$features, '[[', "POS")
tokenizedAndTagged <- data.frame(Tokens = x[POSwords], Tags = tags, 
  stringsAsFactors = FALSE)

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = 
stringsAsFactors) : cannot coerce class ""String"" to a data.frame

我已经看过其他类似的问题，尝试使用NLP::annotate解决功能覆盖问题等解决方案，重新启动R会话但没有工作。请指出如何解决此问题。提前谢谢。

Answer 1

您需要使用Tokens -

包装as.character值

tokenizedAndTagged <- data.frame(Tokens = as.character(x[POSwords]), 
                                 Tags = tags, 
                                 stringsAsFactors = FALSE)

不能强迫班级＆＃34;＆＃34;字符串＆＃34;＆＃34;到data.frame

1 个答案: