我试图创建一个SQL词法分析器(好吧,一个完整的解析器,但你必须从某个地方开始),我不知道如何继续。我想写这样的东西:
def nextToken(input: List[Char]) = input match {
case 'S'::'E'::'L'::'E'::'C'::'T'::tail => (SELECT, tail)
case _ => ??? // etc.
}
但SQL不区分大小写。我可以在输入中大写所有字符,但这也会大写字符串。我真正需要的是一种进行不区分大小写的比较的方法,然后在匹配令牌后保留正确的tail
(余数List[Char]
)。有没有办法在Scala 2.10.x中轻松完成这项工作?
答案 0 :(得分:0)
如果将输入保留为String,似乎正则表达式可以很好地解决这个问题。
val SelectRe = """(?i)SELECT(.*)""".r // the (?i) makes it case-insensitive
def nextToken(input: String) = input match {
case SelectRe(tail) => (SELECT, tail)
case _ => ??? // etc.
}
答案 1 :(得分:0)
我需要这个是一个有效的解析器,所以@ dhg的答案对我不起作用。这是我解决的答案。它使用不区分大小写Trie的变体,但其中
这是特里:
case class CaseInsensitiveTrie[T](children: List[Node[T]] = Nil) {
def add(key: List[Char], value: T): CaseInsensitiveTrie[T] = {
def loop(key: List[Char], nodes: List[Node[T]]): List[Node[T]] = key match {
case c::Nil =>
nodes.find(_.key == c) match {
case Some(node) => nodes // No updating values
case None => Node(c, Some(value), Nil) :: nodes
}
case c::tail =>
nodes.find(_.key == c) match {
case Some(node) =>
val children = loop(tail, node.children)
val newNode = Node(c, None, children)
newNode :: nodes.filterNot(_.key == c)
case None =>
val children = loop(tail, Nil)
val newNode = Node(c, None, children)
newNode :: nodes
}
case Nil => ??? // Programmer error
}
CaseInsensitiveTrie(loop(key, children))
}
def find(input: List[Char]): Option[(T, List[Char])] = {
def loop(input: List[Char], nodes: List[Node[T]]): Option[(T, List[Char])] = input match {
case c::Nil =>
nodes.find(_.key.toUpper == c.toUpper).flatMap(_.value.map(_ -> Nil))
case c::tail =>
nodes.find(_.key.toUpper == c.toUpper).flatMap {
case node if node.value.isDefined =>
node.value.map(_ -> tail)
case node =>
loop(tail, node.children)
}
}
loop(input, children)
}
}
object CaseInsensitiveTrie {
case class Node[T](key: Char, value: Option[T], children: List[CaseInsensitiveTrie.Node[T]])
}
只有2个功能 - find
和add
。仅使用不区分大小写的关键字填充trie,然后我们创建此unapply函数:
val trie = CaseInsensitiveTrie[Token]()
.add("SELECT".toList, SELECT)
.add("FROM".toList, FROM)
.add("WHERE".toList, WHERE)
.add("AS".toList, AS)
object Keyword {
def unapply(input: List[Char]): Option[(Token, List[Char])] = {
trie.find(input)
}
}
因此,在匹配中我们可以测试匹配并同时拉出匹配和输入的其余部分:
input match {
case '*'::tail => (STAR, Scan(tail, location + 1).trim)
case '('::tail => (LPAREN, Scan(tail, location + 1).trim)
case ')'::tail => (RPAREN, Scan(tail, location + 1).trim)
case '.'::tail => (DOT, Scan(tail, location + 1).trim)
case ','::tail => (COMMA, Scan(tail, location + 1).trim)
case '='::tail => (EQ, Scan(tail, location + 1).trim)
case '<'::'>'::tail => (NE, Scan(tail, location + 2).trim)
case '>'::tail => (GT, Scan(tail, location + 1).trim)
case '>'::'='::tail => (GE, Scan(tail, location + 2).trim)
case '<'::tail => (LT, Scan(tail, location + 1).trim)
case '>'::'='::tail => (LE, Scan(tail, location + 2).trim)
case Keyword(kw, tail) if tail.headOption.fold(true)(_.isWhitespace) =>
(kw, Scan(tail, location + kwLength(kw)).trim)
}
这不是最快的Trie实现,但它可以满足我的需求。如果性能成为一个问题,我可以把它变成一个压缩的trie