Question

I have a set of words that I would like to exclude from my analysis. For example,

trash<- c("de" , "do", "das", ...., "da") # this set can be with n elements

Also, I have a data.frame named matc with two variables v1 and v2 which I would like to apply the replacements of each word in trash by nothing.

When I tried to do this using the following code:

for(k in 1:length(pr_us))
 {
   matc$V1<- gsub(pr_us[k],  "" , matc$V1 )
   matc$V2<- gsub(pr_us[k],  "" , matc$V2 )
 }

the replacement isn't exact. In other words, if matc$V1 is "Maria da Graça Madalena", the result is "Maria Graça Malena" and I would like the following result "Maria Graça Madalena". I tried something like this

for(k in 1:length(pr_us))
{
  matc$V1<- gsub( paste0(pr_us[k], "\bb") , "" , matc$V1 )
  matc$V2<- gsub( paste0(pr_us[k], "\bb") , "" , matc$V2 )
}

But, this also does not work.

Is there some solution avoiding the loop? Some solution with the apply functions...

Answer 1

由于您匹配单词，因此在垃圾单词之前和之后包含空格更为合理。因此，对于OP给出的具体示例，它可以是：

gsub("\\s+da\\s+", " ", "Maria da Graça Madalena")
[1] "Maria Graça Madalena"

How do I replace exact set of words?

1 个答案: