我正在尝试编写一个函数,当一个文本进行搬运工堵塞时,该函数返回单词的词干映射。当我试图运行一个例子时,代码不会停止运行,即没有输出。没有错误,但是当我强制停止它时,它会发出警告:
1: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
2: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
3: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
4: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
5: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
6: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
7: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
8: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
9: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
我的代码如下:
stemMAP<-function(text){
flatText<-unlist(strsplit(text," "))
textLength<-length(flatText)
stemList<-list(NULL)
for(i in 1:textLength){
wordStem<-SnowballStemmer(flatText[i])
flagStem=0
flagWord=0
for(j in 1:length(stemList)){
if(regexpr(wordStem,stemList[j][1])==TRUE){
for(k in 1:length(stemList[j])){
if(regexpr(flatText[i],stemList[j][k])==TRUE){
flagWord=1
#break;
}
}
if(flagWord==0){
stemList[j][length(stemList[j])+1]<-flatText[i]
#break;
}
flagStem=1
}
if(flagStem==0){
stemList[length(stemList)+1][1]<-wordStem
stemList[length(stemList)+1][2]<-flatText[i]
}
}
}
return(stemList)
}
我如何识别错误?我的测试声明是:
stem<-stemMAP("I like being active and playing because when you play it activates your body and this activation leads to a good health")
答案 0 :(得分:5)
在这里,我使用SnowballStemmer
的矢量化版本重写您的代码。无需使用。
library(plyr)
stemMAP<-function(text){
flatText <- unlist(strsplit(text," "))
## here I use the vectorize version
wordStem <- as.character(SnowballStemmer(flatText))
hh <- data.frame(ff = flatText,sn = wordStem)
## I use plyr to transform the result to a list
## dlply : data.frame to list apply
## we group the hh by the column sn , and a apply the
## function as.character(x$ff) to each group( x here is subset data.fame)
stemList <- dlply(hh,.(sn),function(x) as.character(x$ff))
stemList
}
stemList
$I
[1] "I"
$a
[1] "a"
$activ
[1] "active" "activates" "activation"
$and
[1] "and" "and"
$be
[1] "being"