Question

我正在做一个关于情感分析的文字挖掘。但是，当我在英语文章中使用文本挖掘时遇到了一个问题。我想问一下是否有任何与Jieba软件包的“ worker（type ='tag'）”函数相似的函数，但是它在英语文本挖掘软件包（例如：tidytext）中使用吗？

在下面，这是我的代码的一部分。此代码用于中文文本挖掘。但是，我想以类似的方式进行英语文本挖掘。我可以使用什么函数来代替worker（type =“ tag”）

library(jiebaRD)
library(jiebaR)
library(dplyr)
jieba <- worker(type="tag",user="C:/Users/User/Desktop/dict/bbb.txt",symbol = TRUE)

ecal<-function(str){
  result <- jieba <= str    
  winfront <- 1L 
  count <- 1  
  winvalue <- c()  
  posvalue <- c()  
  negvalue <-c ()  
  pvalue <- 0L     
  nvalue <- 0L    
  ppcount <- 1
  nncount <- 1
  rheflag <- FALSE
  for (i in 1:length(result)){
    if(names(result[i])=="positive"){      
      #cat("find positive word：",result[i],"\n")      
      if(i==1)
        winvalue[count] <- 1
      else{
        winvalue[count] <- 1
        for (j in (i-1):winfront) {
          if(!is.na((as.numeric(names(result[j])))))
            winvalue[count] = winvalue[count]*as.numeric(names(result[j]))
          else if(names(result[j])=="deny")
            winvalue[count] = winvalue[count]*(-1)
          else if(names(result[j])=="rhe")
            rheflag <- TRUE
        }
      }
      #cat("the value of window is：",winvalue[count],"\n")      
      count = count+1
      winfront <- i+1
    }

Answer 1

您可以执行以下操作：

library(udpipe)
x <- udpipe("我拜訪了我在香港的朋友", "chinese")

请注意，如果您已经有中文标记器（例如，捷巴），则也可以使用udpipe使用pos标签丰富标记化的数据-请参阅https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-annotation.html#annotate_your_text中的“我的文本数据已经标记化”部分

根据通用依赖性在中文-GSD上构建的模型的准确度统计报告如下：https://github.com/jwijffels/udpipe.models.ud.2.3/blob/master/inst/udpipe-ud-2.3-181115/README 令牌化不是最理想的，但是给定金令牌化的pos标签在准确性方面还不错。

“是否有用于标记单词主题（例如：名词，adj）的R函数（文本分析）？”

1 个答案: