Question

给出一个单词对列表

val terms = ("word1a", "word1b") :: ("word2a", "word2b") :: ... :: Nil

Scala中最优雅的方法是测试文本中是否出现至少一个对吗？当测试到达第一场比赛时，测试应该尽快终止。你会如何解决这个问题？

编辑：更确切地说，我想知道文本中一对中的两个单词是否出现在某处（不一定按顺序）。如果列表中的一个对的情况如此，则该方法应返回true。没有必要返回匹配的对，如果多于一对匹配则不重要。

Answer 1

scala> val text = Set("blah1", "word2b", "blah2", "word2a")
text: scala.collection.immutable.Set[java.lang.String] = Set(blah1, word2b, blah2)

scala> terms.exists{case (a,b) => text(a) && text(b)}
res12: Boolean = true

编辑：请注意，使用集合来表示文本中的标记可以使contains的查找效率更高。你不会想要像List这样的顺序使用。

编辑2：更新了要求中的说明！

编辑3：根据评论中的建议将contains更改为apply

Answer 2

编辑 - 看起来你问题含糊不清的措辞意味着我回答了另一个问题：

因为你基本上要求任何一对;你可以将所有这些拼凑成一个大集。

val words = (Set.empty[String] /: terms) { case (s, (w1, w2)) => s + w1 + w2 }

然后你只是问问文本中是否存在以下任何一个：

text.split("\\s") exists words

这很快，因为我们可以使用Set的结构快速查找文字中是否包含该单词;它由于“存在”而提前终止：

scala> val text = "blah1  blah2 word2b"
text: java.lang.String = blah1  blah2 word2b

如果您的文字很长，您可能希望Stream它，以便下一个要测试的单词是懒惰计算的，而不是将字符串拆分为子字符串 - 前：

scala> val Word = """\s*(.*)""".r
Word: scala.util.matching.Regex = \s*(.*)

scala> def strmWds(text : String) : Stream[String] = text match {
     | case Word(nxt) => val (word, rest) = nxt span (_ != ' '); word #:: strmWds(rest)
     | case _         => Stream.empty
     | }
strmWds: (text: String)Stream[String]

现在你可以：

scala> strmWds(text) exists words
res4: Boolean = true

scala> text.split("\\s") exists words
res3: Boolean = true

Answer 3

我假设这两个元素必须出现在文本中，但无论在哪里，并且无关紧要哪个对出现。

我不确定这是最优雅的，但它并不坏，而且如果您希望文本可能包含单词（因此您不需要阅读所有内容），并且它是相当快的，并且如果你可以生成一个迭代器，它会一次给你一个单词：

case class WordPair(one: String, two: String) {
  private[this] var found_one, found_two = false
  def check(s: String): Boolean = {
    if (s==one) found_one = true
    if (s==two) found_two == true
    found_one && found_two
  }
  def reset {
    found_one = false
    found_two = false
  }
}

val wordpairlist = terms.map { case (w1,w2) => WordPair(w1,w2) }

// May need to wordpairlist.foreach(_.reset) first, if you do this on multiple texts
text.iterator.exists(w => wordpairlist.exists(_.check(w)))

你可以通过将所有术语放在一个集合中来进一步改进，甚至不用去检查wordpairlist，除非文本中的单词在该集合中。

如果您的意思是必须按顺序彼此相邻，那么您应该将check更改为

def check(s: String) = {
  if (found_one && s==two) found_two = true
  else if (s==one) { found_one = true; found_two = false }
  else found_two = false
  found_one && found_two
}

使用Scala在文本中查找单词对的最优雅方法是什么？

3 个答案: