在Java中解析缩进的文本树

时间:2014-02-12 17:41:04

标签: java algorithm

我有一个缩进文件,我需要使用java解析, 我需要一些方法将它放在Section类中,如下所示

    root
     root1
       text1
         text1.1
         text1.2
       text2
         text2.1
         text2.2

     root2
       text1
         text1.1
         text1.2
       text2
         text2.1
         text2.2.2

我有用于放置缩进的东西的类

public class Section 
{

    private List<Section> children;
    private String text;
    private int depth;
    public Section(String t)
    {
       text =t;
    }

    public List<Section> getChildren()
    {
        if (children == null)
      {
            children = new ArrayList<Section>();
       }
        return children;
}

public void setChildren(List<Section> newChildren)
{
    if (newChildren == null) {
        children = newChildren;
    } else {
        if (children == null) {
            children = new ArrayList<Section>();
        }
        for (Section child : newChildren) {
            this.addChild(child);
        }
    }
}

public void addChild(Section child)
{
    if (children == null) {
        children = new ArrayList<Section>();
    }
    if (child != null) {
        children.add(child);
    }
}

public String getText()
{
    return text;
}

public void setText(String newText)
{
    text =newText;
}
public String getDepth()
{
    return depth;
}

 public void setDepth(int newDepth)
 {
    depth = newDepth;
 }
}

我需要一些方法来解析文件并将其放在预期的结果中,这是一个看起来像下面的Section对象

Section= 

Text="Root"
Children
Child1: Text= "root1" 

        Child1: "text1"
            Child1="Text 1.1"
            Child2="Text 1.2"
        Child2: "text2"
            Child1="Text 2.1"
            Child2="Text 2.2"
            Children
Child2: Text= "root2" 
        Child1: "text1"
            Child1="Text 1.1"
            Child2="Text 1.2"
        Child2: "text2"
            Child1="Text 2.1"
            Child2="Text 2.2"


Here is some code that I have started
   int indentCount=0;
   while(String text = reader.readline()
   {
   indentCount=countLeadingSpaces(String word);
   //TODO create the section here
   }


public static int countLeadingSpaces(String word)
{
    int length=word.length();
    int count=0;

   for(int i=0;i<length;i++)
   {
       char first = word.charAt(i); 
        if(Character.isWhitespace(first))
        {
            count++;           
        }
        else
        {
            return count;
        }
   }

 return count;

}

4 个答案:

答案 0 :(得分:4)

我也添加了一个父指针。也许文本可以在没有它的情况下解析,但父指针使它更容易。首先,你需要有更多的构造函数:

static final int root_depth = 4; // assuming 4 whitespaces precede the tree root

public Section(String text, int depth) {
    this.text     = text;
    this.depth    = depth;
    this.children = new ArrayList<Section>();
    this.parent   = null;
}

public Section(String text, int depth, Section parent) {
    this.text     = text;
    this.depth    = depth;
    this.children = new ArrayList<Section>();
    this.parent   = parent;
}

然后,当您开始解析文件时,请逐行阅读:

Section prev = null;
for (String line; (line = bufferedReader.readLine()) != null; ) {
    if (prev == null && line begins with root_depth whitespaces) {
        Section root = new Section(text_of_line, root_depth);
        prev = root;
    }
    else {
        int t_depth = no. of whitespaces at the beginning of this line;
        if (t_depth > prev.getDepth())
            // assuming that empty sections are not allowed
            Section t_section = new Section(text_of_line, t_depth, prev);
            prev.addChild(t_section);
        }
        else if (t_depth == prev.getDepth) {
            Section t_section = new Section(text_of_line, t_depth, prev.getParent());
            prev.getParent().addChild(t_section);
        }
        else {
            while (t_depth < prev.getDepth()) {
                prev = prev.getParent();
            }
            // at this point, (t_depth == prev.getDepth()) = true
            Section t_section = new Section(text_of_line, t_depth, prev.getParent());
            prev.getParent().addChild(t_section);
        }
    }
}

我已经掩盖了伪代码的一些细节,但我认为你可以全面了解如何进行这种解析。记得实现方法addChild(),getDepth(),getParent()等。

答案 1 :(得分:3)

令人惊讶的复杂问题......但这里是伪代码

intialize a stack
push first line to stack
while (there are more lines to read) {
 S1 = top of stack // do not pop off yet
 S2 = read a line
 if depth of S1 < depth of S2 {
  add S2 as child of S1
  push S2 into stack
 }
 else {
  while (depth of S1 >= depth of S2 AND there are at least 2 elements in stack) {
   pop stack
   S1 = top of stack // do not pop
  }
  add S2 as child of S1
  push S2 into stack
 }
}
return bottom element of stack

其中depth是#leading whitespaces。 您可能必须修改或包装Section类以存储行的深度。

答案 2 :(得分:3)

C#

中的实现

基于this answer,我创建了一个C#解决方案。

它允许多个根并假定输入结构如:

Test
    A
    B
    C
        C1
        C2
    D
Something
    One
    Two
    Three

代码的示例用法是:

var lines = new[]
{
    "Test",
    "\tA",
    "\tB",
    "\tC",
    "\t\tC1",
    "\t\tC2",
    "\tD",
    "Something",
    "\tOne",
    "\tTwo",
    "\tThree"
};

var roots = IndentedTextToTreeParser.Parse(lines, 0, '\t');

var dump = IndentedTextToTreeParser.Dump(roots);
Console.WriteLine(dump);

您可以指定根缩进(默认为零)以及标识字符(默认情况下为标签\t)。

完整代码:

namespace MyNamespace
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Text;

    public static class IndentedTextToTreeParser
    {
        // https://stackoverflow.com/questions/21735468/parse-indented-text-tree-in-java

        public static List<IndentTreeNode> Parse(IEnumerable<string> lines, int rootDepth = 0, char indentChar = '\t')
        {
            var roots = new List<IndentTreeNode>();

            // --

            IndentTreeNode prev = null;

            foreach (var line in lines)
            {
                if (string.IsNullOrEmpty(line.Trim(indentChar)))
                    throw new Exception(@"Empty lines are not allowed.");

                var currentDepth = countWhiteSpacesAtBeginningOfLine(line, indentChar);

                if (currentDepth == rootDepth)
                {
                    var root = new IndentTreeNode(line, rootDepth);
                    prev = root;

                    roots.Add(root);
                }
                else
                {
                    if (prev == null)
                        throw new Exception(@"Unexpected indention.");
                    if (currentDepth > prev.Depth + 1)
                        throw new Exception(@"Unexpected indention (children were skipped).");

                    if (currentDepth > prev.Depth)
                    {
                        var node = new IndentTreeNode(line.Trim(), currentDepth, prev);
                        prev.AddChild(node);

                        prev = node;
                    }
                    else if (currentDepth == prev.Depth)
                    {
                        var node = new IndentTreeNode(line.Trim(), currentDepth, prev.Parent);
                        prev.Parent.AddChild(node);

                        prev = node;
                    }
                    else
                    {
                        while (currentDepth < prev.Depth) prev = prev.Parent;

                        // at this point, (currentDepth == prev.Depth) = true
                        var node = new IndentTreeNode(line.Trim(indentChar), currentDepth, prev.Parent);
                        prev.Parent.AddChild(node);
                    }
                }
            }

            // --

            return roots;
        }

        public static string Dump(IEnumerable<IndentTreeNode> roots)
        {
            var sb = new StringBuilder();

            foreach (var root in roots)
            {
                doDump(root, sb, @"");
            }

            return sb.ToString();
        }

        private static int countWhiteSpacesAtBeginningOfLine(string line, char indentChar)
        {
            var lengthBefore = line.Length;
            var lengthAfter = line.TrimStart(indentChar).Length;
            return lengthBefore - lengthAfter;
        }

        private static void doDump(IndentTreeNode treeNode, StringBuilder sb, string indent)
        {
            sb.AppendLine(indent + treeNode.Text);
            foreach (var child in treeNode.Children)
            {
                doDump(child, sb, indent + @"    ");
            }
        }
    }

    [DebuggerDisplay(@"{Depth}: {Text} ({Children.Count} children)")]
    public class IndentTreeNode
    {
        public IndentTreeNode(string text, int depth = 0, IndentTreeNode parent = null)
        {
            Text = text;
            Depth = depth;
            Parent = parent;
        }

        public string Text { get; }
        public int Depth { get; }
        public IndentTreeNode Parent { get; }
        public List<IndentTreeNode> Children { get; } = new List<IndentTreeNode>();

        public void AddChild(IndentTreeNode child)
        {
            if (child != null) Children.Add(child);
        }
    }
}

我还添加了一个方法Dump()来将树转换回字符串,以便更好地调试算法本身。

答案 3 :(得分:1)

我使用递归函数调用实现了替代解决方案。据我所知,它的性能会比Max Seo的建议更差,特别是在深层次上。然而,它更容易理解(在我看来),因此根据您的具体需求进行修改。如果您有任何建议,请查看并告诉我。

一个好处是它可以处理具有多个根的树,就像它一样。

问题描述 - 只是为了清楚......

假设我们有一个构造节点,它可以包含数据并且具有零或更多 孩子们,也是节点。基于文本输入,我们想要构建树 节点,其中每个节点的数据是来自一行的内容,以及 树中节点的位置由行位置和缩进表示, 因此,缩进的行是第一行的子行,即 不那么缩进。

算法描述

假设我们有一个行列表,定义一个函数:

  • 如果输入列表至少有两行:
    • 从列表中删除第一行
    • 从列表中删除满足所有行的所有行:
      • 具有比第一行更高的缩进
      • 在下一行之前发生,缩进小于或等于第一行
    • 以递归方式将这些行传递给函数,并将结果设置为第一行的子句
    • 如果它有剩余的行,则递归地将这些行传递给函数,并将它们与第一行组合,结果
    • 如果没有剩余的行,则返回第一行作为单个元素的列表
  • 如果输入列表有一行:
    • 将该一行的子项设置为空列表
    • 返回列表
  • 如果输入列表没有元素
    • 返回空列表
  • 从列表中删除第一行

使用行列表调用函数将生成树列表, 根据他们的缩进。如果树只有一个根,那么结果树将会 是结果列表的第一个元素。

伪代码

List<Node> LinesToTree( List<Line> lines )
{
    if(lines.count >= 2)
    {
        firstLine = lines.shift
        nextLine = lines[0]
        children = List<Line>

        while(nextLine != null && firstLine.indent < nextLine.indent)
        {
            children.add(lines.shift)
            nextLine = lines[0]
        }

        firstLineNode = new Node
        firstLineNode.data = firstLine.data
        firstLineNode.children = LinesToTree(children)

        resultNodes = new List<Node>
        resultNodes.add(firstLineNode)

        if(lines.count > 0)
        {
            siblingNodes = LinesToTree(lines)
            resultNodes.addAll(siblingNodes)
            return resultNodes
        }
        else
        {
            return resultNodes
        }
    }
    elseif()
    {
        nodes = new List<Node>
        node = new Node
        node.data = lines[0].data
        node.children = new List<Node>
        return nodes
    }
    else
    {
        return new List<Node>
    }
}

使用数组的PHP实现

可以通过委托来实现实现,以获取缩进,以及输出数组中子字段的名称。

public static function IndentedLinesToTreeArray(array $lineArrays, callable $getIndent = null, $childrenFieldName = "children")
{
    //Default function to get element indentation
    if($getIndent == null){
        $getIndent = function($line){
            return $line["indent"];
        };
    }

    $lineCount = count($lineArrays);

    if($lineCount >= 2)
    {
        $firstLine = array_shift($lineArrays);
        $children = [];
        $nextLine = $lineArrays[0];

        while($getIndent($firstLine) < $getIndent($nextLine)){
            $children[] = array_shift($lineArrays);
            if(!isset($lineArrays[0])){
                break;
            }
            $nextLine = $lineArrays[0];
        }

        $firstLine[$childrenFieldName] = self::IndentedLinesToTreeArray($children, $getIndent, $childrenFieldName);

        if(count($lineArrays)){
            return array_merge([$firstLine],self::IndentedLinesToTreeArray($lineArrays, $getIndent, $childrenFieldName));
        }else{
            return [$firstLine];
        }
    }
    elseif($lineCount == 1)
    {
        $lineArrays[0][$childrenFieldName] = [];
        return $lineArrays;
    }
    else
    {
        return [];
    }
}