用stringi替换单词

时间:2016-01-29 19:49:25

标签: r nlp stringi

我正在尝试使用stringi来替换使用stri_replace的某些单词,但是在替换单词的相似部分时遇到了一个问题。在下面的例子中,我正在修复三角形的拼写错误,但它似乎变得混乱,因为' tri'是'trian'的一部分是'三角形的一部分,它就像“trainglegle”一样。'我对stri_replace并不熟悉,是否有一些我不知道的争论?谢谢你的帮助。

stri_replace_all_regex("The quick brown tri jumped over the lazy trian.",
      c("tri", "trian", "fox"), c("triangle",  "triangle", "bear"), 
         vectorize_all=FALSE)

## [1] "The quick brown trianglegle jumped over the lazy triangleglean."

2 个答案:

答案 0 :(得分:3)

您可能希望隔离单词以使它们不同。 \\W是非字符。你可以试试这样的东西:

stri_replace_all_regex("The quick brown tri jumped over the lazy trian.",
                   paste0(c("trian", "tri",  "fox"), "(\\W)"), 
                   paste0(c("triangle","triangle", "bear"),"$1"),
                   vectorize_all = FALSE)
[1] "The quick brown triangle jumped over the lazy triangle."

答案 1 :(得分:0)

如果你不想完成部分匹配,那么用空格终止一些(或者甚至所有你的模式参数)(并且还替换空格:

stri_replace_all_regex("The quick brown tri jumped over the lazy trian.",
  pattern=c("tri "), repl=c("triangle "), 
     vectorize_all=FALSE)

stri_replace_all_regex("The quick brown tri jumped over the lazy trian.",
       c("tri ", "trian", "fox "), c("triangle ",  "triangle", "bear "), 
          vectorize_all=TRUE)
[1] "The quick brown triangle jumped over the lazy trian."
[2] "The quick brown tri jumped over the lazy triangle."  
[3] "The quick brown tri jumped over the lazy trian."