Question

我需要从一段纯文本以及应插入的每个XML元素的开始和结束偏移创建一个XML文档。以下是我希望通过的一些测试用例：

val text = "The dog chased the cat."
val spans = Seq(
    (0, 23, <xml/>),
    (4, 22, <phrase/>),
    (4, 7, <token/>))
val expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
assert(expected === spansToXML(text, spans))

val text = "aabbccdd"
val spans = Seq(
    (0, 8, <xml x="1"/>),
    (0, 4, <ab y="foo"/>),
    (4, 8, <cd z="42>3"/>))
val expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
assert(expected === spansToXML(text, spans))

val spans = Seq(
    (0, 1, <a/>),
    (0, 0, <b/>),
    (0, 0, <c/>),
    (1, 1, <d/>),
    (1, 1, <e/>))
assert(<a><b/><c/> <d/><e/></a> === spansToXML(" ", spans))

我的部分解决方案（请参阅下面的答案）通过字符串连接和XML.loadString工作。这看起来很糟糕，我也不是100％确定这个解决方案在所有角落情况下都能正常工作......

有更好的解决方案吗？（对于它的价值，我很乐意切换到anti-xml，如果这样可以使这项任务更容易。）

2011年8月10日更新以添加更多测试用例并提供更清晰的规范。

Answer 1

鉴于你提出的赏金，我研究了你的问题已经有一段时间了，并提出了以下解决方案，它在你所有的测试用例上取得了成功。我真的希望接受我的回答 - 请告诉我我的解决方案是否有任何问题。

一些评论：如果你想知道执行过程中发生了什么，我把评论出来的打印声明留在里面。除了你的规范，我确实保留了他们现有的孩子（如果有的话） - 这里有一个评论。

我没有手动构建XML节点，我修改了传入的XML节点。为了避免拆分开始和结束标记，我不得不更改算法，但是按begin排序跨度的想法-end来自您的解决方案。

代码有点高级Scala，特别是当我构建我需要的不同Orderings时。我从我得到的第一个版本中做了一些简化。

我通过使用SortedMap避免创建表示间隔的树，并在提取后过滤间隔。这种选择有点不理想;然而，我听说有更好的数据结构用于表示嵌套间隔，例如间隔树（它们在计算几何中研究），但实现起来相当复杂，我不认为这里需要它。

/**
 * User: pgiarrusso
 * Date: 12/8/2011
 */

import collection.mutable.ArrayBuffer
import collection.SortedMap
import scala.xml._

object SpansToXmlTest {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]) = {
        val intOrdering = implicitly[Ordering[Int]] // Retrieves the standard ordering on Ints.

        // Sort spans decreasingly on begin and increasingly on end and their label - this processes spans outwards.
        // The sorting on labels matches the given examples.
        val spanOrder = Ordering.Tuple3(intOrdering.reverse, intOrdering, Ordering.by((_: Elem).label))

        //Same sorting, excluding labels.
        val intervalOrder = Ordering.Tuple2(intOrdering.reverse, intOrdering)
        //Map intervals of the source string to the sequence of nodes which match them - it is a sequence because
        //multiple spans for the same interval are allowed.
        var intervalMap = SortedMap[(Int, Int), Seq[Node]]()(intervalOrder)

        for ((start, end, elem) <- spans.sorted(spanOrder)) {
            //Only nested intervals. Interval nesting is a partial order, therefore we cannot use the filter function as an ordering for intervalMap, even if it would be nice.
            val nestedIntervalsMap = intervalMap.until((start, end)).filter(_ match {
                case ((intStart, intEnd), _) => start <= intStart && intEnd <= end
            })
            //println("intervalMap: " + intervalMap)
            //println("beforeMap: " + nestedIntervalsMap)

            //We call sorted to use a standard ordering this time.
            val before = nestedIntervalsMap.keys.toSeq.sorted

            // text.slice(start, end) must be split into fragments, some of which are represented by text node, some by
            // already computed xml nodes.
            val intervals = start +: (for {
                (intStart, intEnd) <- before
                boundary <- Seq(intStart, intEnd)
            } yield boundary) :+ end

            var xmlChildren = ArrayBuffer[Node]()
            var useXmlNode = false

            for (interv <- intervals.sliding(2)) {
                val intervStart = interv(0)
                val intervEnd = interv(1)
                xmlChildren.++=(
                    if (useXmlNode)
                        intervalMap((intervStart, intervEnd)) //Precomputed nodes
                    else
                        Seq(Text(text.slice(intervStart, intervEnd))))
                useXmlNode = !useXmlNode //The next interval will be of the opposite kind.
            }
            //Remove intervals that we just processed
            intervalMap = intervalMap -- before

            // By using elem.child, you also preserve existing xml children. "elem.child ++" can be also commented out.
            var tree = elem.copy(child = elem.child ++ xmlChildren)
            intervalMap += (start, end) -> (intervalMap.getOrElse((start, end), Seq.empty) :+ tree)
            //println(tree)
        }
        intervalMap((0, text.length)).head
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }
    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3() =
        test(
            text = " ",
            spans = Seq(
                (0, 1, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/></a>
        )

    def main(args: Array[String]) {
        test1()
        test2()
        test3()
    }
}

Answer 2

这很有趣！

我采取了类似于史蒂夫的方法。通过在“开始标签”和“结束标签”中对元素进行排序，然后计算放置它们的位置。

我无耻地偷走了Blaisorblade的测试版，并添加了一些帮助我开发代码的测试。

于2011-08-14编辑

我不满意如何在test-5中插入空标签。然而，这个放置的地方是test-3如何制定的结果

即使空标记（c，d）位于跨度列表中的生成标记（a）之后，c，d标记的插入点与a的结束标记相同，c，d标记也会进入一个。这使得很难在生成标记之间放置空标记，这可能很有用。

所以，我稍微改变了一些测试，并提供了另一种解决方案。

在替代解决方案中，我以相同的方式启动，但有3个单独的列表，start，empty和closing标签。而不是仅排序我有第三步，空标签被放入标签列表。

第一个解决方案：

import xml.{XML, Elem, Node}
import annotation.tailrec

object SpanToXml {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
        // Create a Seq of elements, sorted by where it should be inserted
        //  differentiate start tags ('s) and empty tags ('e)
        val startElms = spans sorted Ordering[Int].on[(Int, _, _)](_._1) map {
            case e if e._1 != e._2 => (e._1, e._3, 's)
            case e => (e._1, e._3, 'e)
        }
        //Create a Seq of closing tags ('c), sorted by where they should be inserted
        // filter out all empty tags
        val endElms = spans.reverse.sorted(Ordering[Int].on[(_, Int, _)](_._2))
            .filter(e => e._1 != e._2)
            .map(e => (e._2, e._3, 'c))

        //Combine the Seq's and sort by insertion point
        val elms = startElms ++ endElms sorted Ordering[Int].on[(Int, _, _)](_._1)
        //The sorting need to be refined
        // - end tag's need to come before start tag's if the insertion point is thesame
        val sorted = elms.sortWith((a, b) => a._1 == b._1 && a._3 == 'c && b._3 == 's )

        //Adjust the insertion point to what it should be in the final string
        // then insert the tags into the text by folding left
        // - there are different rules depending on start, empty or close
        val txt = adjustInset(sorted).foldLeft(text)((tx, e) => {
            val s = tx.splitAt(e._1)
            e match {
                case (_, elem, 's) => s._1 + "<" + elem.label + elem.attributes + ">" + s._2
                case (_, elem, 'e) => s._1 + "<" + elem.label + elem.attributes + "/>" + s._2
                case (_, elem, 'c) => s._1 + "</" + elem.label + ">" + s._2
            }
        })
        //Sanity check
        //println(txt)

        //Convert to XML
        XML.loadString(txt)
    }

    def adjustInset(elems: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
        @tailrec
        def adjIns(elems: Seq[(Int, Elem, Symbol)], tmp: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] =
            elems match {
                case t :: Nil => tmp :+ t
                case t :: ts => {
                    //calculate offset due to current element
                    val offset = t match {
                        case (_, e, 's) => e.label.size + e.attributes.toString.size + 2
                        case (_, e, 'e) => e.label.size + e.attributes.toString.size + 3
                        case (_, e, 'c) => e.label.size + 3
                    }
                    //add offset to all elm's in tail, and recurse
                    adjIns(ts.map(e => (e._1 + offset, e._2, e._3)), tmp :+ t)
                }
            }

            adjIns(elems, Nil)
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }

    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3() =
        test(
            text = " ",
            spans = Seq(
                (0, 1, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/></a>
        )

    def test4() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aabb</ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (4, 4, <empty/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b>bb<empty/></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test6() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (2, 4, <c/>),
                        (3, 4, <d/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b><c>b<d>b</d></c></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test7() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab a="a" b="b"/>),
                        (4, 8, <cd c="c" d="d"/>)),
            expected = <xml><ab a="a" b="b">aabb</ab><cd c="c" d="d">ccdd</cd></xml>
        )

    def invalidSpans() = {
        val text = "aabbccdd"
        val spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 6, <err/>),
                        (4, 8, <cd/>))
        try {
            val res = spansToXML(text, spans)
            assert(false)
        } catch {
            case e => {
                println("This generate invalid XML:")
                println("<xml><ab>aabb</ab><err><cd>cc</err>dd</cd></xml>")
                println(e.getMessage)
            }
        }
    }

    def main(args: Array[String]) {
        test1()
        test2()
        test3()
        test4()
        test5()
        test6()
        test7()
        invalidSpans()
    }
}

SpanToXml.main(Array())

替代解决方案：

import xml.{XML, Elem, Node}
import annotation.tailrec

object SpanToXmlAlt {
    def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
        // Create a Seq of start tags, sorted by where it should be inserted
        // filter out all empty tags
        val startElms = spans.sorted(Ordering[Int].on[(Int, _, _)](_._1))
            .filterNot(e => e._1 == e._2)
            .map(e => (e._1, e._3, 's))
        //Create a Seq of closing tags, sorted by where they should be inserted
        // filter out all empty tags
        val endElms = spans.reverse.sorted(Ordering[Int].on[(_, Int, _)](_._2))
            .filterNot(e => e._1 == e._2)
            .map(e => (e._2, e._3, 'c))

        //Create a Seq of empty tags, sorted by where they should be inserted
        val emptyElms = spans.sorted(Ordering[Int].on[(Int, _, _)](_._1))
            .filter(e => e._1 == e._2)
            .map(e => (e._1, e._3, 'e))

        //Combine the Seq's and sort by insertion point
        val elms = startElms ++ endElms sorted Ordering[Int].on[(Int, _, _)](_._1)
        //The sorting need to be refined
        // - end tag's need to come before start tag's if the insertion point is the same
        val sorted = elms.sortWith((a, b) => a._1 == b._1 && a._3 == 'c && b._3 == 's )

        //Insert empty tags
        val allSorted = insertEmpyt(spans, sorted, emptyElms) sorted Ordering[Int].on[(Int, _, _)](_._1)
        //Adjust the insertion point to what it should be in the final string
        // then insert the tags into the text by folding left
        // - there are different rules depending on start, empty or close
        val str = adjustInset(allSorted).foldLeft(text)((tx, e) => {
            val s = tx.splitAt(e._1)
            e match {
                case (_, elem, 's) => s._1 + "<" + elem.label + elem.attributes + ">" + s._2
                case (_, elem, 'e) => s._1 + "<" + elem.label + elem.attributes + "/>" + s._2
                case (_, elem, 'c) => s._1 + "</" + elem.label + ">" + s._2
            }
        })
        //Sanity check
        //println(str)
        //Convert to XML
        XML.loadString(str)
    }

    def insertEmpyt(spans: Seq[(Int, Int, Elem)],
        sorted: Seq[(Int, Elem, Symbol)],
        emptys: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {

        //Find all tags that should be before the empty tag
        @tailrec
        def afterSpan(empty: (Int, Elem, Symbol),
            spans: Seq[(Int, Int, Elem)],
            after: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
            var result = after
            spans match {
                case t :: _ if t._1 == empty._1 && t._2 == empty._1 && t._3 == empty._2 => after //break
                case t :: ts if t._1 == t._2 => afterSpan(empty, ts, after :+ (t._1, t._3, 'e))
                case t :: ts => {
                    if (t._1 <= empty._1) result = result :+ (t._1, t._3, 's)
                    if (t._2 <= empty._1) result = result :+ (t._2, t._3, 'c)
                    afterSpan(empty, ts, result)
                }
            }
        }

        //For each empty tag, insert it in the sorted list
        var result = sorted
        emptys.foreach(e => {
            val afterSpans = afterSpan(e, spans, Seq[(Int, Elem, Symbol)]())
            var emptyInserted = false
            result = result.foldLeft(Seq[(Int, Elem, Symbol)]())((res, s) => {
                if (afterSpans.contains(s) || emptyInserted) {
                    res :+ s
                } else {
                    emptyInserted = true
                    res :+ e :+ s
                }
            })
        })
        result
    }

    def adjustInset(elems: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] = {
        @tailrec
        def adjIns(elems: Seq[(Int, Elem, Symbol)], tmp: Seq[(Int, Elem, Symbol)]): Seq[(Int, Elem, Symbol)] =
            elems match {
                case t :: Nil => tmp :+ t
                case t :: ts => {
                    //calculate offset due to current element
                    val offset = t match {
                        case (_, e, 's) => e.label.size + e.attributes.toString.size + 2
                        case (_, e, 'e) => e.label.size + e.attributes.toString.size + 3
                        case (_, e, 'c) => e.label.size + 3
                    }
                    //add offset to all elm's in tail, and recurse
                    adjIns(ts.map(e => (e._1 + offset, e._2, e._3)), tmp :+ t)
                }
            }

            adjIns(elems, Nil)
    }

    def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
        val res = spansToXML(text, spans)
        print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
        assert(expected == res)
    }

    def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq(
                (0, 23, <xml/>),
                (4, 22, <phrase/>),
                (4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

    def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

    def test3alt() =
        test(
            text = "  ",
            spans = Seq(
                (0, 2, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/> </a>
        )

    def test4() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aabb</ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5alt() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (4, 4, <empty/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b>bb</b></ab><empty/><cd><ok>cc</ok>dd</cd></xml>
        )

    def test5b() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 2, <empty1/>),
                        (4, 4, <empty2/>),
                        (2, 4, <b/>),
                        (2, 2, <empty3/>),
                        (4, 4, <empty4/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<empty1/><b><empty3/>bb<empty2/></b></ab><empty4/><cd><ok>cc</ok>dd</cd></xml>
        )

    def test6() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (2, 4, <b/>),
                        (2, 4, <c/>),
                        (3, 4, <d/>),
                        (4, 8, <cd/>),
                        (4, 6, <ok/>)),
            expected = <xml><ab>aa<b><c>b<d>b</d></c></b></ab><cd><ok>cc</ok>dd</cd></xml>
        )

    def test7() =
        test(
            text = "aabbccdd",
            spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab a="a" b="b"/>),
                        (4, 8, <cd c="c" d="d"/>)),
            expected = <xml><ab a="a" b="b">aabb</ab><cd c="c" d="d">ccdd</cd></xml>
        )

    def failedSpans() = {
        val text = "aabbccdd"
        val spans = Seq((0, 8, <xml/>),
                        (0, 4, <ab/>),
                        (4, 6, <err/>),
                        (4, 8, <cd/>))
        try {
            val res = spansToXML(text, spans)
            assert(false)
        } catch {
            case e => {
                println("This generate invalid XML:")
                println("<xml><ab>aabb</ab><err><cd>cc</err>dd</cd></xml>")
                println(e.getMessage)
            }
        }

    }

    def main(args: Array[String]) {
        test1()
        test2()
        test3alt()
        test4()
        test5alt()
        test5b()
        test6()
        test7()
        failedSpans()
    }
}

SpanToXmlAlt.main(Array())

Answer 3

我的解决方案是递归的。我根据需要对输入Seq进行排序，并将其转换为List。之后根据规格需要进行基本模式匹配。我的解决方案的最大缺点是，虽然.toString在测试方法==中生成相同的字符串，但不会产生真实。

import scala.xml.{NodeSeq, Elem, Text}

object SpansToXml {
  type NodeSpan = (Int, Int, Elem)

  def adjustIndices(offset: Int, spans: List[NodeSpan]) = spans.map {
    case (spanStart, spanEnd, spanNode) => (spanStart - offset, spanEnd - offset, spanNode)
  }

  def sortedSpansToXml(text: String, spans: List[NodeSpan]): NodeSeq = {
    spans match {
      // current span starts and ends at index 0, thus no inner text exists
      case (0, 0, node) :: rest => node +: sortedSpansToXml(text, rest)

      // current span starts at index 0 and ends somewhere greater than 0
      case (0, end, node) :: rest =>
        // partition the text and the remaining spans in inner and outer and process both independently
        val (innerSpans, outerSpans) = rest.partition {
          case (spanStart, spanEnd, spanNode) => spanStart <= end && spanEnd <= end
        }
        val (innerText, outerText) = text.splitAt(end)

        // prepend the generated node to the outer xml
        node.copy(child = node.child ++ sortedSpansToXml(innerText, innerSpans)) +: sortedSpansToXml(outerText, adjustIndices(end, outerSpans))

      // current span has starts at an index larger than 0, convert text prefix to text node
      case (start, end, node) :: rest =>
        val (pre, spanned) = text.splitAt(start)
        Text(pre) +: sortedSpansToXml(spanned, adjustIndices(start, spans))

      // all spans consumed: we can just return the text as node
      case Nil =>
        Text(text)
    }
  }

  def spansToXml(xmlText: String, nodeSpans: Seq[NodeSpan]) = {
    val sortedSpans = nodeSpans.toList.sortBy {
      case (start, end, _) => (start, -end)
    }
    sortedSpansToXml(xmlText, sortedSpans)
  }

  // test code stolen from Blaisorblade and david.rosell

  def test(text: String, spans: Seq[(Int, Int, Elem)], expected: NodeSeq) {
    val res = spansToXml(text, spans)
    print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
    // Had to resort on to string here.
    assert(expected.toString == res.toString)
  }

  def test1() =
        test(
            text = "The dog chased the cat.",
            spans = Seq((0, 23, <xml/>),(4, 22, <phrase/>),(4, 7, <token/>)),
            expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
        )

  def test2() =
        test(
            text = "aabbccdd",
            spans = Seq(
                (0, 8, <xml x="1"/>),
                (0, 4, <ab y="foo"/>),
                (4, 8, <cd z="42>3"/>)),
            expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
        )

  def test3() =
        test(
            text = " ",
            spans = Seq(
                (0, 1, <a/>),
                (0, 0, <b/>),
                (0, 0, <c/>),
                (1, 1, <d/>),
                (1, 1, <e/>)),
            expected = <a><b/><c/> <d/><e/></a>
        )

  def test4() =
      test(
          text = "aabbccdd",
          spans = Seq((0, 8, <xml/>),
                      (0, 4, <ab/>),
                      (4, 8, <cd/>),
                      (4, 6, <ok/>)),
          expected = <xml><ab>aabb</ab><cd><ok>cc</ok>dd</cd></xml>
      )

  def test5() =
      test(
          text = "aabbccdd",
          spans = Seq((0, 8, <xml/>),
                      (0, 4, <ab/>),
                      (2, 4, <b/>),
                      (4, 4, <empty/>),
                      (4, 8, <cd/>),
                      (4, 6, <ok/>)),
          expected = <xml><ab>aa<b>bb<empty/></b></ab><cd><ok>cc</ok>dd</cd></xml>
      )

  def test6() =
      test(
          text = "aabbccdd",
          spans = Seq((0, 8, <xml/>),
                      (0, 4, <ab/>),
                      (2, 4, <b/>),
                      (2, 4, <c/>),
                      (3, 4, <d/>),
                      (4, 8, <cd/>),
                      (4, 6, <ok/>)),
          expected = <xml><ab>aa<b><c>b<d>b</d></c></b></ab><cd><ok>cc</ok>dd</cd></xml>
      )

  def test7() =
      test(
          text = "aabbccdd",
          spans = Seq((0, 8, <xml/>),
                      (0, 4, <ab a="a" b="b"/>),
                      (4, 8, <cd c="c" d="d"/>)),
          expected = <xml><ab a="a" b="b">aabb</ab><cd c="c" d="d">ccdd</cd></xml>
      )

}

Answer 4

您可以轻松地动态创建XML节点：

scala> import scala.xml._
import scala.xml._

scala> Elem(null, "AAA",xml.Null,xml.TopScope, Array[Node]():_*)
res2: scala.xml.Elem = <AAA></AAA>

以下是Elem.apply签名def apply (prefix: String, label: String, attributes: MetaData, scope: NamespaceBinding, child: Node*) : Elem

我用这种方法看到的唯一问题是你需要首先构建内部节点。

让事情变得更轻松的事情：

scala> def elem(name:String, children:Node*) = Elem(null, name ,xml.Null,xml.TopScope, children:_*); def elem(name:String):Elem=elem(name, Array[Node]():_*);

scala> elem("A",elem("B"))
res11: scala.xml.Elem = <A><B></B></A>

Answer 5

这是一个使用字符串连接和XML.loadString接近正确的解决方案：

def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]): Node = {
  // arrange items so that at each offset:
  //   closing tags sort before opening tags
  //   with two opening tags, the one with the later closing tag sorts first
  //   with two closing tags, the one with the later opening tag sorts first
  val items = Buffer[(Int, Int, Int, String)]()
  for ((begin, end, elem) <- spans) {
    val elemStr = elem.toString
    val splitIndex = elemStr.indexOf('>') + 1
    val beginTag = elemStr.substring(0, splitIndex)
    val endTag = elemStr.substring(splitIndex)
    items += ((begin, +1, -end, beginTag))
    items += ((end, -1, -begin, endTag))
  }
  // group tags to be inserted by index
  val inserts = Map[Int, Buffer[String]]()
  for ((index, _, _, tag) <- items.sorted) {
    inserts.getOrElseUpdate(index, Buffer[String]()) += tag
  }
  // put tags and characters into a buffer
  val result = Buffer[String]()
  for (i <- 0 until text.size + 1) {
    for (tags <- inserts.get(i); tag <- tags) {
      result += tag
    }
    result += text.slice(i, i + 1)
  }
  // create XML from the string buffer
  XML.loadString(result.mkString)
}

这会传递前两个测试用例，但在第三个测试用例上失败。

从scala中的文本和元素偏移创建XML

5 个答案: