Question

我在R中获得了一个代码来提取文本列表的情感并将其保存在数据框中，它用于情感分析项目。我是r和coreNLP的新手，所以我一直在解决内存和类似问题，但我仍然不确定如何解决所有问题。代码中的tripadvisor数据框包含来自TripAdvisor网页的评论，我想从中提取情绪。 TripAdvisor $ titleopinion是包含此数据的列。

我得到的错误是：rJava ::。jcall中的错误（挥发物$ cNLP，“Ledu / stanford / nlp / pipeline / Annotation;”，：带签名的方法过程（I）Ledu / stanford / nlp / pipeline / Annotation;找不到

我在每个R会话中运行的命令是：

Sys.setenv（JAVA_HOME ='C：\ Program Files \ Java \ jre1.8.0_121'）
options（java.parameters =“ - Xmx8g”）

我的电脑有8G内存，我有时会出现内存问题。我正在加载的dali1.csv包含大约450个我希望从中提取情绪的文本实例。

代码如下：

library(data.table)
library(devtools)
devtools::install_github("statsmaths/coreNLP")
#coreNLP::downloadCoreNLP()
library(coreNLP)

initCoreNLP("C:/TFG/stanford-corenlp-full-2016-10-31")

# Read the data
TripAdvisor <- read.csv("C:/TFG/Data/dali/dali1ENG.csv")

# Creating sentiment label
TripAdvisor$SentimentValue <- NA
TripAdvisor$SentimentValue <- ifelse(TripAdvisor$rating <= 2, "negative", 
                                     ifelse(TripAdvisor$rating == 3, "neutral",
                                            ifelse(TripAdvisor$rating >= 4, "positive", TripAdvisor$SentimentValue)))

# Predict sentiment with coreNLP
TripAdvisor$SentimentCoreNLP <- NA
for(i in 1:nrow(TripAdvisor)){
  print(i)
  pos <- 0
  neg <- 0

  opinion <- TripAdvisor$titleopinion[i]
  opinion.df <- getSentiment(annotateString(opinion))

  for(j in 1:nrow(opinion.df)){
    if(opinion.df$sentiment[j]=="Verypositive"){
      pos = pos + 2
    } else if(opinion.df$sentiment[j]=="Positive"){
      pos = pos + 1
    } else if(opinion.df$sentiment[j]=="Negative"){
      neg = neg + 1
    } else if(opinion.df$sentiment[j]=="Verynegative"){
      neg = neg + 2
    }
  }

  TripAdvisor$pos[i] <- pos
  TripAdvisor$neg[i] <- neg

}

TripAdvisor$SentimentCoreNLP <- ifelse(TripAdvisor$pos > TripAdvisor$neg, "positive", 
                                       ifelse(TripAdvisor$pos < TripAdvisor$neg, "negative", "neutral"))

write.csv(TripAdvisor, file="C:/TFG/Data/dali/daliXENG.csv")

# Analysing SentimentValue vs. SentimentCoreNLP

# Table
table(TripAdvisor$SentimentCoreNLP, TripAdvisor$SentimentValue)
#100*(table(TripAdvisor$SentimentCoreNLP, TripAdvisor$SentimentValue)/(nrow(TripAdvisor)))

这段代码应该有效，给我的人在使用i3和8G RAM的机器上没有问题地使用它。任何有关记忆问题和缺少注释器的见解都值得欢迎和赞赏。抱歉我的英语不好，我还在学习:)。

如果我遗漏了一些必要的信息，请告诉我，以便我能提供。

Answer 1

好吧我想我得到了答案，我认为问题可能是titleopinion列的类型是因素而不是字符。我解决了这个问题 TripAdvisor $ titleopinion＆lt; - as.character（TripAdvisor $ titleopinion）。

如果有人读到这个，我是新人，我不知道我应该做什么，我应该删除这个问题吗？

R中的coreNLP：Ledu / stanford / nlp / pipeline / Annotation;未找到

1 个答案: