Question

我有一个来自二叉树的节点串，采用以下序列化格式：

# <- the value of a node
(a b c) <- node b has left child a and right child c.

如果一个节点有一个孩子，它将始终有两个孩子。所有节点都是独立的，它们的值只是node.data的值，因此许多节点可以具有相同的值（但它们仍然是不同的节点）。

例如：

(((1 6 3) 5 (8 1 2)) 10 (1 1 1))

表示树的根值为10，并且有两个值为5和1的子节点。值为5的子节点有两个子节点，6和1. 6有子节点1和3，1有子节点8和2，等等。

我正在尝试将其解析为树，但只知道如何做到这一点＆＃34;效率低下＆＃34;通过修剪开始/结束括号，然后扫描整个字符串，直到(的数量与)的数量匹配。例如：

(((1 6 3) 5 (8 1 2)) 10 (1 1 1))

变为

((1 6 3) 5 (8 1 2)) 10 (1 1 1)

所以我扫描，扫描，扫描，并且在我阅读((1 6 3) 5 (8 1 2))之后括号计数匹配，这意味着我有了左子，这意味着下一个字符将是父母，之后的所有内容都将是合适的孩子。递归，递归等。除了这种方式，我在每一步都浪费了很多时间重新扫描左孩子。

有更好的方法吗？

Answer 1

你没有在这里指定你选择的语言，但这看起来很像LISP。它有一些很好的方法来处理像这样的列表，但尽管如此，我会尝试给出一般答案。

首先，这些是递归方法的步骤，在一些类似Scala的代码中：

def getLocalRoot(record : String) : Node
{
    val (leftChildrenString, rootPosition) = extractLeft(record)
    val rightChildrenString = extractRight(record, rootPosition)
    val localRootString = extractLocalRoot(record)
    val localRoot = new Node(localRootString) //
    if(leftChildrenString.contains('(')) //a hack, really
       localRoot.left = getLocalRoot(leftChildrenString) //not a leaf
    else
       localRoot.left = new Node(leftChildrenString)  //it is a leaf

    if(rightChildrenString.contains('('))
       localRoot.right=getLocalRoot(rightChildrenString)
    else
       localRoot.right = new Node(rightChildrenString)
    return localRoot
}

def findTreeRoot(serializedTree : String) : Node
{
    return getLocalRoot(serializedTree)
}

（（1 5 6） 2 （4 3 0））我把大胆的部分称为“留下的孩子”，右边的“正确的孩子”。

让我们用文字解释。首先，您需要将字符串拆分为左侧和右侧，因此extractLeft和extractRight。我建议你通过从左到右解析字符串来做到这一点，并计算括号。只要在闭括号后计数返回到1，下一个项就是该子树的根。然后返回左侧部分。您还可以返回子树根的位置，将其传递给返回右子的函数，以加快速度。返回字符串右边部分的方法应该只返回右边的所有内容（减去结束)）。

然后，取当前的本地根，存储它，并在左半部分和右半部分调用相同的方法，但只有，如果左半部分或右半部分不是叶子。如果它是一个叶子，那么您可以使用它来实例化一个新节点，并将其附加到现在找到的父节点。我使用了hack，我只是检查字符串是否包含一个括号，你可以想出一个更好的解决方案。

===============替代方法===========

这只需要一次扫描，虽然我不得不用空白填充括号，所以我可以更容易地解析它们，但不过，关键是相同的。我基本上使用了堆栈。一旦你到达一个封闭的括号，从顶部弹出3，合并它们，并将它们推回去。

trait Node

case class Leaf(value: String) extends Node

case class ComplexNode(left: Node, value: Leaf, right: Node) extends Node

object Main {

  def main(args: Array[String]) = {
    val stack = new mutable.Stack[Node]
    var input = "(((1 6 3) 5 (8 1 2)) 10 (1 1 1))"
    input = input.replace(")", " ) ").replace("(", " ( ") //just to ease up parsing, it's easier to extract the numbers

    input.split(" ").foreach(word =>
      word match {
        case ")" => {
          stack push collapse(stack)
        }
        case c : String =>  {
          if (c != "(" && c != "") 
             stack.push(Leaf(c))
        }
      }
    )
    println(stack.pop) //you have your structure on the top of the stack
  }

  def collapse(stack: mutable.Stack[Node]): Node = {
    val right = stack.pop
    val parent = stack.pop.asInstanceOf[Leaf]
    val left = stack.pop
    return new ComplexNode(left, parent, right)

  }
}

Answer 2

是的，您可以编写一个简单的递归解析函数。解析树时：

如果你看到'('，你知道接下来你需要读取左子（即递归地解析树），然后是节点的值，然后是右子（解析）再次递归一棵树）。
如果你看到一个数字，你知道它是一片叶子。

这种方法需要 O（n）时间并使用 O（treeHeight）附加（即字符串和树除外）内存来存储递归的堆栈调用

以下是Python 3中的代码示例：

import re
from collections import namedtuple

Leaf = namedtuple("Leaf", ["value"])

Node = namedtuple("Node", ["value", "left", "right"])

def parse(string):
    # iterator, which returns '(', ')' and numbers from the string
    tokens = re.finditer(r"[()]|\d+", string)

    def next_token():
        return next(tokens).group()

    def tree():
        token = next_token()
        if token == '(':
            left, value, right = tree(), element(), tree()
            next_token() # skipping closing bracket
            return Node(value, left, right)
        else:
            return Leaf(int(token))

    def element():
        return int(next_token())

    return tree()

测试：

In [2]: parse("(((1 6 3) 5 (8 1 2)) 10 (1 1 1))")
Out[2]: Node(value=10, left=Node(value=5, left=Node(value=6, left=Leaf(value=1), right=Leaf(value=3)), right=Node(value=1, left=Leaf(value=8), right=Leaf(value=2))), right=Node(value=1, left=Leaf(value=1), right=Leaf(value=1)))

从字符串中获取单独的标记（括号和数字）的最简单，最简洁的方法是使用正则表达式。大多数正则表达式库支持对非重叠匹配的迭代，而不会立即创建整个匹配数组。

例如，在Java中tokens变量可以声明为Matcher tokens = Pattern.compile("[()]|\\d+").matcher(string);，next_token()则变为matcher.find()。在C ++中，您可以将std::regex_iterator用于相同的目的。

但您也可以通过维护当前字符的索引来手动实现标记。例如：

def parse(source):
    index = 0

    def next_number():
        nonlocal index
        start = index
        while index < len(source) and source[index].isdigit():
            index += 1
        return source[start:index]

    def next_token():
        nonlocal index
        while index < len(source):
            current = source[index]
            if current in [')', '(']:
                index += 1
                return current
            if current.isdigit():
                return next_number() 
            index += 1

    # Functions tree() and element() stay the same.
    # ...

反序列化按顺序格式给出的树？

2 个答案: