将词干词转换为R中的词根

时间:2014-08-06 12:32:29

标签: r text text-mining tm stemming

嗨我有一个单词列表,这些单词是使用" tm"包装在R. 我可以在这一步之后找回根词。在此先感谢。

Ex:activiti - >活性

2 个答案:

答案 0 :(得分:1)

您可以使用stemCompletion()函数来实现此目的,但您可能需要先修剪茎。请考虑以下事项:

library(tm)

library(qdap) # providers the stemmer() function

active.text = "there are plenty of funny activities"

active.corp = Corpus(VectorSource(active.text))

(st.text = tolower(stemmer(active.text,warn=F))) 
# this is what the columns of your Term Document Matrix are going to look like
[1] "there"  "are"    "plenti" "of"     "funni"  "activ" 

st.text = gsub("[aeyuio]+$","",st.text) # removing vowels on the end of each word
stemCompletion(st.text,active.corp,"prevalent") # now it works
        ther           ar        plent           of         funn        activ 
     "there"        "are"     "plenty"         "of"      "funny" "activities" 

请注意尽管词干会使某些词语混乱。例如,“大学”和“普遍”都在成长后成为“大学”,你无法做任何正确的恢复。

希望这有帮助。

答案 1 :(得分:0)

从包stemCompletion查看tm

library(tm)
v <- "There are plenty of activities."
stemCompletion("activiti", scan_tokenizer(tolower(v)))
#     activiti 
# "activities"