是否有系统的方法在任意给定序列中搜索子序列模式?从某种意义上说,它就像正则表达式,而不是元素序列。
更具体地说,我们希望完成此功能
def findPattern(seq: Seq[String], Seq[String]): Seq[Int] = {
// find the indices of seq which matches the input pattern.
// if the pattern is not found, return Seq.empty.
}
例如,对于以下输入和目标模式:
seq: Seq[String] = Seq("NNS", "VBG", "JJ", "NNS", "IN", "NNP", "NNP")
pattern: String = Seq("VBG", "JJ")
所需的输出应为:
Seq(1, 2)
具有相同seq
的另一个示例:
pattern: String = Seq("VBG", "?", "NNS")
所需的输出应为
Seq(1, 2, 3)
又一个例子:
pattern: String = Seq("VBG", "*", "IN")
应导致:
Seq(1, 2, 3, 4)
旁注:可以使输出Seq[Seq[Int]]
适应多种模式的存在。
答案 0 :(得分:1)
我认为解析器应该更有意义找到匹配模式,有一个实现,希望,它对你有帮助:
def findPattern(list: List[String], pattern: List[String]): List[List[Int]] = {
def nextPattern(lt: Option[List[(String, Int)]], ps: List[String]): Option[List[(String, Int)]] = {
ps match {
//if only have "*" should return all
case List("*") => lt
//filter whether first str match head, if not return None
case List(head) =>
lt.filter(_.nonEmpty).filter(_.head._1 == head).map(r => {
List(r.head)
})
//minimum match for wildcard for first str
case "*" :: List(last) =>
lt.filter(_.nonEmpty).flatMap(t => {
t.find(_._1 == last).map(i => {
t.takeWhile(_._1 != last) :+ i
})
})
case "*" :: last :: l =>
nextPattern(lt, List("*", last)).flatMap(j => {
nextPattern(lt.map(_.drop(j.size)), l).map(i => {
j ++ i
})
})
//skip fist str
case "?" :: l =>
lt.filter(_.nonEmpty).flatMap(r => {
nextPattern(Some(r.tail), l).map(j => {
r.head :: j
})
})
//match the list first str
case head :: l =>
lt.filter(_.nonEmpty).filter(_.head._1 == head).flatMap(r => {
nextPattern(Some(r.tail), l).map(j => {
r.head :: j
})
})
}
}
//if any is empty, return None
list.isEmpty || pattern.isEmpty match {
case true => List.empty
case false =>
val relevantIndices = list.zipWithIndex.filter(_._1 == pattern.head).map(_._2)
val relevantSublists = relevantIndices.map(list.zipWithIndex.drop)
relevantSublists.map{ sublist =>
nextPattern(Some(sublist), pattern).map(_.map(_._2))
}.filter(_.isDefined).map(_.get)
}
}
测试:
val list = List("NNS", "VBG", "JJ", "NNS", "IN", "NNP", "NNP")
println(findPattern(list, List("NNS", "VBG")))
println(findPattern(list, List("NNS", "*", "VBG")))
println(findPattern(list, List("NNS", "?", "VBG")))
println(findPattern(list, List("NNS", "?", "JJ")))
println(findPattern(list, List("VBG", "?", "NNS")))
println(findPattern(list, List("JJ")))
println(findPattern(list, List("VBG", "*", "IN")))
println(findPattern(list, List("VBG", "*")))
println(findPattern(list, List("Foo")))
println(findPattern(list, List("VBG", "*", "Bar")))
println(findPattern(list, List("NNS")))
导致:
[info] List(List(0, 1))
[info] List(List(0, 1))
[info] List()
[info] List(List(0, 1, 2))
[info] List(List(1, 2, 3))
[info] List(List(2))
[info] List(List(1, 2, 3, 4))
[info] List(List(1, 2, 3, 4, 5, 6))
[info] List()
[info] List()
[info] List(List(0), List(3))