I have a set of words that I would like to exclude from my analysis. For example,
trash<- c("de" , "do", "das", ...., "da") # this set can be with n elements
Also, I have a data.frame named matc with two variables v1 and v2 which I would like to apply the replacements of each word in trash by nothing.
When I tried to do this using the following code:
for(k in 1:length(pr_us))
{
matc$V1<- gsub(pr_us[k], "" , matc$V1 )
matc$V2<- gsub(pr_us[k], "" , matc$V2 )
}
the replacement isn't exact. In other words, if matc$V1 is "Maria da Graça Madalena", the result is "Maria Graça Malena" and I would like the following result "Maria Graça Madalena". I tried something like this
for(k in 1:length(pr_us))
{
matc$V1<- gsub( paste0(pr_us[k], "\bb") , "" , matc$V1 )
matc$V2<- gsub( paste0(pr_us[k], "\bb") , "" , matc$V2 )
}
But, this also does not work.
Is there some solution avoiding the loop? Some solution with the apply functions...
答案 0 :(得分:1)
由于您匹配单词,因此在垃圾单词之前和之后包含空格更为合理。因此,对于OP给出的具体示例,它可以是:
gsub("\\s+da\\s+", " ", "Maria da Graça Madalena")
[1] "Maria Graça Madalena"