我正在对r中的某些文本进行pos标记。我想要的对应于文本列中的每一行,如下所示,我只希望对应列pos_tag中的名词,动词和形容词
text pos_tag
1 Hi, I am looking for xyz in your website website, looking
2 when will my product delivered product, delivered
3 Is the stock available for abc stock, available, abc
我在这里有一些线索:Extracting the POS tags in R using,但它似乎可以在标量而不是向量上工作。因此用于循环使它可以在文本的每一行上使用。另一个线索在这里:POS tagging for each record in R
library(openNLP)
library(NLP)
data <- data.frame(text=c("Hi,i am looking for xyz in your website",
"when will my product delivered",
"is the stock available for abc"))
data$text <- as.character(data$text)
for(i in 1:nrow(data)){
data$text[i] <- as.String(data$text[i])
}
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
for(i in 1:nrow(data)){
s <- data$text[i]
s <- as.String(s)
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator))
pos_tag_annotator <- Maxent_POS_Tag_Annotator()
a3 <- annotate(s, pos_tag_annotator, a2)
a3w <- subset(a3, type == "word")
tags <- sapply(a3w$features,'[[',"POS")
b <- sprintf("%s/%s",s[a3w],tags)
b <- grep("/NNP|/NN|ADJ",b,value = T)
b <- gsub("/NNP|/NN|/ADJ","",b)
b <- list(b)
data$pos_tag[i] <- b
}
但是我不确定我是正确执行还是在r中执行它们的更好方法。请建议