我正在寻找一种更好,更清洁的方法来清除长字符串中的不良单词。
我有一个文本文件,其中包含数百个不好的单词,我正在遍历每个单词-使用它来创建正则表达式模式并用星号替换匹配项。
import scala.io.Source
def removeBadWords(comment: String): String = {
val bufferedBadWords = Source.fromFile("/Users/me/Desktop/badwords.txt")
val badWords = bufferedBadWords.getLines.toList
bufferedBadWords.close
var newComment = comment
for(badWord <- badWords) {
newComment = badWord.r.replaceAllIn(newComment, "*" * badWord.length)
}
newComment
}
val sentence = "These are just a couple of [bad word] sentences. I want to [bad word] replace certain words with [bad word] asterisks - if [bad word] possible."
println(removeBadWords(sentence))
// Result: These are just a couple of **** sentences. I want to ******* replace certain words with ******* asterisks - if ******* possible.
是否有更高效,更惯用的方式来实现这一目标?
答案 0 :(得分:3)
您可以一次完成所有操作,但是您可能无法使替换字符串与错误字符串的长度匹配。
def removeBadWords(comment :String) :String =
io.Source
.fromFile("badwords.txt") //open file
.getLines //without newline chars
.mkString("\\b(", "|", ")\\b") //regex with word boundaries
.r //compile
.replaceAllIn(comment, "****") //return cleaned comment