How do I replace exact set of words?

时间:2016-06-10 16:11:34

标签: r

I have a set of words that I would like to exclude from my analysis. For example,

trash<- c("de" , "do", "das", ...., "da") # this set can be with n elements 

Also, I have a data.frame named matc with two variables v1 and v2 which I would like to apply the replacements of each word in trash by nothing.

When I tried to do this using the following code:

for(k in 1:length(pr_us))
 {
   matc$V1<- gsub(pr_us[k],  "" , matc$V1 )
   matc$V2<- gsub(pr_us[k],  "" , matc$V2 )
 }

the replacement isn't exact. In other words, if matc$V1 is "Maria da Graça Madalena", the result is "Maria Graça Malena" and I would like the following result "Maria Graça Madalena". I tried something like this

for(k in 1:length(pr_us))
{
  matc$V1<- gsub( paste0(pr_us[k], "\bb") , "" , matc$V1 )
  matc$V2<- gsub( paste0(pr_us[k], "\bb") , "" , matc$V2 )
}

But, this also does not work.

Is there some solution avoiding the loop? Some solution with the apply functions...

1 个答案:

答案 0 :(得分:1)

由于您匹配单词,因此在垃圾单词之前和之后包含空格更为合理。因此,对于OP给出的具体示例,它可以是:

gsub("\\s+da\\s+", " ", "Maria da Graça Madalena")
[1] "Maria Graça Madalena"