Question

我想用'Kindertoekomst'替换我的语料库中包含'kind'的每个单词。我通常可以这样做：

Woorden<-c("kinderen", "kleinkind")
Woorden[grepl("kind", Woorden)]<-"Kindertoekomst"

但我想在我的语料库中做到这一点。

我设法用

做到了这一点

Kind<-grepl("kind", Woorden)
docs <- tm_map(docs, function(x) stri_replace_all_fixed(x, Woorden[as.logical(Kind)], "kindertoekomst", vectorize_all = FALSE))

但是我不能再使用其他功能了：

dtm <- DocumentTermMatrix(docs)

错误：inherits（doc，“TextDocument”）不为TRUE

和 corpus_clean＆lt; - tm_map（docs，content_transformer（tolower）） UseMethod（“content”，x）中的错误：没有适用于“内容”的适用方法应用于“字符”类的对象

请帮帮我:)。

Answer 1

这应该有效：

docs <- tm_map(docs, function(x) stri_replace_all_fixed(x, Woorden[as.logical(Kind)], "kindertoekomst", vectorize_all = FALSE))
docs <- tm_map(docs, PlainTextDocument) 
dtm <- DocumentTermMatrix(docs)

Answer 2

在content_transformer()包中使用tm函数包装器的替代方法

library(tm)

Woorden<-c("kinderen", "kleinkind")

rep_kind <- function(x){ 
  gsub("\b.*kind.*\b","Kindertoekomst",x)
}

docs <- Corpus(VectorSource(as.list(Woorden)))
docs <- tm_map(docs, content_transformer(rep_kind))
dtm <- DocumentTermMatrix(docs)
inspect(dtm)

R：数据挖掘。替换包含子字符串的单词

2 个答案: