我想将一个正则表达式列表应用于字符串。我目前的方法不是很实用
我目前的代码:
val stopWords = List[String](
"the",
"restaurant",
"bar",
"[^a-zA-Z -]"
)
def CanonicalName(name: String): String = {
var nameM = name
for (reg <- stopWords) {
nameM = nameM.replaceAll(reg, "")
}
nameM = nameM.replaceAll(" +", " ").trim
return nameM
}
答案 0 :(得分:2)
我认为这可以满足您的需求。
def CanonicalName(name: String): String = {
val stopWords = List("the", "restaurant", "bar", "[^a-zA-Z -]")
stopWords.foldLeft(name)(_.replaceAll(_, "")).replaceAll(" +"," ").trim
}
答案 1 :(得分:0)
'replaceAll'有可能替换一个单词的一部分,例如:“thermal&amp; BBQ restaurant”被替换为“rmal becue”。如果您想要的是“热烧烤”,您可以首先拆分名称,然后逐字应用您的停用词规则:
def isStopWord(word: String): Boolean = stopWords.exists(word.matches)
def CanonicalName(name: String): String =
name.replaceAll(" +", " ").trim.split(" ").flatMap(n => if (isStopWord(n)) List() else List(n)).mkString(" ")