Question

嗨我有一个单词列表，这些单词是使用＆＃34; tm＆＃34;包装在R. 我可以在这一步之后找回根词。在此先感谢。

Ex：activiti - ＆gt;活性

Answer 1

您可以使用stemCompletion（）函数来实现此目的，但您可能需要先修剪茎。请考虑以下事项：

library(tm)

library(qdap) # providers the stemmer() function

active.text = "there are plenty of funny activities"

active.corp = Corpus(VectorSource(active.text))

(st.text = tolower(stemmer(active.text,warn=F))) 
# this is what the columns of your Term Document Matrix are going to look like
[1] "there"  "are"    "plenti" "of"     "funni"  "activ" 

st.text = gsub("[aeyuio]+$","",st.text) # removing vowels on the end of each word
stemCompletion(st.text,active.corp,"prevalent") # now it works
        ther           ar        plent           of         funn        activ 
     "there"        "are"     "plenty"         "of"      "funny" "activities"

请注意尽管词干会使某些词语混乱。例如，“大学”和“普遍”都在成长后成为“大学”，你无法做任何正确的恢复。

希望这有帮助。

Answer 2

从包stemCompletion查看tm：

library(tm)
v <- "There are plenty of activities."
stemCompletion("activiti", scan_tokenizer(tolower(v)))
#     activiti 
# "activities"

将词干词转换为R中的词根

2 个答案: