Scala:列表/序列的正则表达式

时间:2017-01-13 06:34:06

标签: scala

是否有系统的方法在任意给定序列中搜索子序列模式?从某种意义上说,它就像正则表达式,而不是元素序列。

更具体地说,我们希望完成此功能

def findPattern(seq: Seq[String], Seq[String]): Seq[Int] = { 
  // find the indices of seq which matches the input pattern.
  // if the pattern is not found, return Seq.empty.
}

例如,对于以下输入和目标模式:

seq: Seq[String] = Seq("NNS", "VBG", "JJ", "NNS", "IN", "NNP", "NNP")
pattern: String = Seq("VBG", "JJ") 

所需的输出应为:

Seq(1, 2)

具有相同seq的另一个示例:

pattern: String = Seq("VBG", "?", "NNS") 

所需的输出应为

Seq(1, 2, 3)

又一个例子:

pattern: String = Seq("VBG", "*", "IN") 

应导致:

Seq(1, 2, 3, 4)

旁注:可以使输出Seq[Seq[Int]]适应多种模式的存在。

1 个答案:

答案 0 :(得分:1)

我认为解析器应该更有意义找到匹配模式,有一个实现,希望,它对你有帮助:

  def findPattern(list: List[String], pattern: List[String]): List[List[Int]] = {
    def nextPattern(lt: Option[List[(String, Int)]], ps: List[String]): Option[List[(String, Int)]] = {
      ps match {
        //if only have "*" should return all
        case List("*") => lt
        //filter whether first str match head, if not return None
        case List(head) =>
          lt.filter(_.nonEmpty).filter(_.head._1 == head).map(r => {
            List(r.head)
          })
        //minimum match for wildcard for first str
        case "*" :: List(last) =>
          lt.filter(_.nonEmpty).flatMap(t => {
            t.find(_._1 == last).map(i => {
              t.takeWhile(_._1 != last) :+ i
            })
          })
        case "*" :: last :: l =>
          nextPattern(lt, List("*", last)).flatMap(j => {
            nextPattern(lt.map(_.drop(j.size)), l).map(i => {
              j ++ i
            })
          })
        //skip fist str
        case "?" :: l =>
          lt.filter(_.nonEmpty).flatMap(r => {
            nextPattern(Some(r.tail), l).map(j => {
              r.head :: j
            })
          })
        //match the list first str
        case head :: l =>
          lt.filter(_.nonEmpty).filter(_.head._1 == head).flatMap(r => {
            nextPattern(Some(r.tail), l).map(j => {
              r.head :: j
            })
          })
      }
    }
    //if any is empty, return None
    list.isEmpty || pattern.isEmpty match {
      case true => List.empty
      case false =>
        val relevantIndices = list.zipWithIndex.filter(_._1 == pattern.head).map(_._2)
        val relevantSublists = relevantIndices.map(list.zipWithIndex.drop)
        relevantSublists.map{ sublist =>
          nextPattern(Some(sublist), pattern).map(_.map(_._2))
        }.filter(_.isDefined).map(_.get)
    }
  }

测试:

    val list = List("NNS", "VBG", "JJ", "NNS", "IN", "NNP", "NNP")

    println(findPattern(list, List("NNS", "VBG")))
    println(findPattern(list, List("NNS", "*", "VBG")))
    println(findPattern(list, List("NNS", "?", "VBG")))
    println(findPattern(list, List("NNS", "?", "JJ")))
    println(findPattern(list, List("VBG", "?", "NNS")))
    println(findPattern(list, List("JJ")))
    println(findPattern(list, List("VBG", "*", "IN")))
    println(findPattern(list, List("VBG", "*")))
    println(findPattern(list, List("Foo")))
    println(findPattern(list, List("VBG", "*", "Bar")))
    println(findPattern(list, List("NNS")))

导致:

[info] List(List(0, 1))
[info] List(List(0, 1))
[info] List()
[info] List(List(0, 1, 2))
[info] List(List(1, 2, 3))
[info] List(List(2))
[info] List(List(1, 2, 3, 4))
[info] List(List(1, 2, 3, 4, 5, 6))
[info] List()
[info] List()
[info] List(List(0), List(3))