替换/删除单独列表中存在的字符串中的单词

时间:2019-05-17 16:49:00

标签: scala replace

我正在寻找一种更好,更清洁的方法来清除长字符串中的不良单词。

我有一个文本文件,其中包含数百个不好的单词,我正在遍历每个单词-使用它来创建正则表达式模式并用星号替换匹配项。

import scala.io.Source

def removeBadWords(comment: String): String = {
  val bufferedBadWords = Source.fromFile("/Users/me/Desktop/badwords.txt")
  val badWords = bufferedBadWords.getLines.toList
  bufferedBadWords.close

  var newComment = comment
  for(badWord <- badWords) {
    newComment = badWord.r.replaceAllIn(newComment, "*" * badWord.length)
  }

  newComment
}

val sentence = "These are just a couple of [bad word] sentences. I want to [bad word] replace certain words with [bad word] asterisks - if [bad word] possible."
println(removeBadWords(sentence))

// Result: These are just a couple of **** sentences. I want to ******* replace certain words with ******* asterisks - if ******* possible.

是否有更高效,更惯用的方式来实现这一目标?

1 个答案:

答案 0 :(得分:3)

您可以一次完成所有操作,但是您可能无法使替换字符串与错误字符串的长度匹配。

def removeBadWords(comment :String) :String =
  io.Source
    .fromFile("badwords.txt")       //open file
    .getLines                       //without newline chars
    .mkString("\\b(", "|", ")\\b")  //regex with word boundaries
    .r                              //compile
    .replaceAllIn(comment, "****")  //return cleaned comment