用于返回文本动词的POS标记功能中的错误结果

时间:2018-12-04 12:26:20

标签: r function nlp text-mining opennlp

我具有以下功能,用于以“ |”分隔的文本返回动词。有谁知道第二个函数带来了错误的结果是什么问题?

> library(openNLP) 
> library(NLP) 
> tagPOS <-  function(x, ...) { 
+     s <- as.String(x) 
+     if(s=="") return(list())
+     word_token_annotator <- Maxent_Word_Token_Annotator() 
+     a2 <- Annotation(1L, "sentence", 1L, nchar(s)) 
+     a2 <- annotate(s, word_token_annotator, a2) 
+     a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2) 
+     a3w <- a3[a3$type == "word"] 
+     POStags <- unlist(lapply(a3w$features, `[[`, "POS")) 
+     POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ") 
+     list(POStagged = POStagged, POStags = POStags) 
+ } 

> verbs <-function(x) {
+     tagPOSx <- tagPOS(x)
+     scanx <- scan(text=as.character(x), what="character")
+     n <- length(scanx)
+     paste(scanx[(1:n)[grepl("VB", tagPOSx$POStags)]], collapse="|")
+ }

以及现在的工作方式:

 x="hello Sir ,This is applicable to your system only  correct ? ,because I 
 can see under affected products other things as well"
 > verbs(x)
 [1] "applicable|under" 

它应该返回:

 is|correct|can|see

0 个答案:

没有答案