我正在使用scala.xml.pull来解析变大的xml文件。这对于事件处理非常有用,但我想要做的是让我的解析器为特定节点咳出一个小文档,我看不到一种简单的方法,或者至少不是“scala”方式。 / p>
我在想我构建一个这样的搜索函数,它可以使用迭代器来查找与我的标记匹配的EvElemStart事件:
def seek(tag: String) = {
while (it.hasNext) {
it.next match {
case EvElemStart(_, `tag`, _, _) =>
之后我不太清楚了。是否有一种简单的方法可以将此标记的所有子项都捕获到文档中,而不必遍历XMLEventReader弹出的每个事件?
我最终要找的是一个扫描文件的过程,并为我可以使用普通scala xml处理处理的特定标记或标记集的每个实例发出一个xml元素(一个Elem?)。 / p>
答案 0 :(得分:2)
这就是我最终做的事情。 slurp(tag)寻找标签的下一个实例,并返回该标签的完整节点树。
def slurp(tag: String): Option[Node] = {
while (it.hasNext) {
it.next match {
case EvElemStart(pre, `tag`, attrs, _) => return Some(subTree(tag, attrs))
case _ =>
}
}
return None
}
def subTree(tag: String, attrs: MetaData): Node = {
var children = List[Node]()
while (it.hasNext) {
it.next match {
case EvElemStart(_, t, a, _) => {
children = children :+ subTree(t, a)
}
case EvText(t) => {
children = children :+ Text(t)
}
case EvElemEnd(_, t) => {
return new Elem(null, tag, attrs, xml.TopScope, children: _*)
}
case _ =>
}
}
return null // this shouldn't happen with good XML
}
答案 1 :(得分:2)
基于Jim Baldwin的答案,我创建了一个迭代器,它获取特定级别的节点(而不是特定的标签):
import scala.io.Source
import scala.xml.parsing.FatalError
import scala.xml.{Elem, MetaData, Node, Text, TopScope}
import scala.xml.pull.{EvElemEnd, EvElemStart, EvText, XMLEventReader}
/**
* Streaming XML parser which yields Scala XML Nodes.
*
* Usage:
*
* val it = new XMLNodeIterator(pathToXML, 1)
*
* Will give you all book-nodes of
*
* <?xml version="1.0" encoding="UTF-8"?>
* <books>
* <book>
* <title>A book title</title>
* </book>
* <book>
* <title>Another book title</title>
* </book>
* </books>
*
*/
class StreamingXMLParser(filename: String, wantedNodeLevel: Int) extends Iterator[Node] {
val file = Source.fromFile(filename)
val it = new XMLEventReader(file)
var currentLevel = 0
var nextEvent = it.next // peek into next event
def getNext() = {
val currentEvent = nextEvent
nextEvent = it.next
currentEvent
}
def hasNext = {
while (it.hasNext && !nextEvent.isInstanceOf[EvElemStart]) {
getNext() match {
case EvElemEnd(_, _) => {
currentLevel -= 1
}
case _ => // noop
}
}
it.hasNext
}
def next: Node = {
if (!hasNext) throw new NoSuchElementException
getNext() match {
case EvElemStart(pre, tag, attrs, _) => {
if (currentLevel == wantedNodeLevel) {
currentLevel += 1
getElemWithChildren(tag, attrs)
}
else {
currentLevel += 1
next
}
}
case EvElemEnd(_, _) => {
currentLevel -= 1
next
}
case _ => next
}
}
def getElemWithChildren(tag: String, attrs: MetaData): Node = {
var children = List[Node]()
while (it.hasNext) {
getNext() match {
case EvElemStart(_, t, a, _) => {
currentLevel += 1
children = children :+ getElemWithChildren(t, a)
}
case EvText(t) => {
children = children :+ Text(t)
}
case EvElemEnd(_, _) => {
currentLevel -= 1
return new Elem(null, tag, attrs, TopScope, true, children: _*)
}
case _ =>
}
}
throw new FatalError("Failed to parse XML.")
}
}