Question

所以我试图编写一个程序，该程序将包含简单HTML语法的文件放入树中，该树将显示标记的层次结构。最终，每个叶子将包含一个标签（即p，h，ul等）和文本。这很简单，我打算使用Jtree来显示最终输出。但是，我遇到的困难是通过语法并使用标记构建初始树而不会丢失关系。我认为整个文件将是一个长字符串。该计划将找到一个＆＃39;＆lt;＆＃39;第二个字符不是＆＃39; /＆＃39;并考虑一个新的标签/叶子。然后代码将继续并检查下一组字符，以查看是否还有其他字符＆＃39;＆lt;＆＃39;这表示儿童标签。如果是＆＃39; /＆＃39;在＆＃39;＆lt;＆＃;;之后的第二个字符中找到，然后代码将移动到同一级别的下一个字母。

希望你能得到我想要做的事情，不幸的是，我对它的尝试不太成功，因为它只显示了根标签的子节点。目前，我只是试图让标签在树中工作，文本和我以后不能弄清楚的。为了测试代码，我使用了一个字符串"test"，它有一些基本的示例html代码，每个节点都在创建jtree时显示在根目录中，但node2中的子节点从不显示。我很困惑，不能在这周围饶舌。另外，有更简单/有效的方法吗？

**编辑：所以我修改了使用JSoup工作的代码。我设法让它工作，但是，我遇到了一个问题，出于某种原因，head标记的第一个子标记除了body之外都被移动了。所以现在body有3个孩子而不是1个，而head只有1个而不是3个。另外，我如何修改getChildren()递归函数以适用于前一个子节点中的每个子图层？例如，要在h3代码中获取title代码？

package weboqltree_converter;

import javax.swing.JFrame;
import javax.swing.JTree;
import javax.swing.SwingUtilities;
import javax.swing.tree.DefaultMutableTreeNode;
import java.util.ArrayList;
import java.awt.Dimension;
import java.util.List;
import javax.swing.tree.TreeNode;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;

public class GUI extends JFrame
{
    private JTree tree;
    private String test = "<html>"
            +   "<head>"
            +       "<title><h3>First parse<h3></title>"
            +       "<a></a>"
            +       "<h3></h3>"
            +   "</head>"
            +   "<body>"
            +       "<p>Parsed HTML into a doc.</p>"
            +   "</body>"
            + "</html>";

    private int parentNode;

    public static void main(String[] args)
    {
        SwingUtilities.invokeLater(new Runnable() {
            public void run() {
                new GUI();
            }
        });
    }

    public GUI()
    {
        DefaultMutableTreeNode html = new DefaultMutableTreeNode("html");
        Document doc = Jsoup.parse(test);
        int children = doc.childNodes().get(0).childNodes().size();
        for(int i=0; i < children; i++){
            String tag = doc.childNodes().get(0).childNodes().get(i).nodeName();
            String text = "N/A"; //doc.childNodes().get(0).childNodes().get(i).toString();

            html.add(new DefaultMutableTreeNode("Tag: " + tag+ ", Text: " + text));

            System.out.println(tag+" : "+doc.childNodes().get(0).childNodes().get(i).childNodeSize());

            if(doc.childNodes().get(0).childNodes().get(i).childNodeSize() > 0){
                getChildren(html.getLastLeaf(), doc.childNodes().get(0).childNodes().get(i),0, doc.childNodes().get(0).childNodes().get(i).childNodeSize());
            }
        }
        System.out.println("tag: " + children);           


        //System.out.println(Tree.get(2) +" "+Tree.get(2).getChildCount());
        tree = new JTree(html);
        add(tree);

        this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        this.setTitle("JTree Example"); 
        this.setMinimumSize(new Dimension(300, 400));
        this.setExtendedState(3);
        this.pack();
        this.setVisible(true);
    }

    public void getChildren(DefaultMutableTreeNode tree, Node doc, int start, int size){

        tree.add(new DefaultMutableTreeNode("Tag: " + doc.childNodes().get(start).nodeName()));
        start++;

        if(start < size){
            getChildren(tree, doc, start, size);
        }

    }
}

Answer 1

您可以使用JSoup来执行此操作。它读取一个String，一个文件或URL并将其解析为一个Document对象（它的速度非常快）。之后，您可以导航对象并从中创建一个JTree。

String html = "<html><head><title>First parse</title></head><body><p>Parsed HTML into a doc.</p></body></html>";
Document document = Jsoup.parse(html);

<强>更新

我已将您的代码更改为使用递归方法。因为文档中可能有多个根节点（通常是“document”-tag和“html”-tag），所以最好添加一个默认的根节点。看看：

public GUI() {
    // create window
    this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    this.setTitle("JTree Example");
    this.setMinimumSize(new Dimension(300, 400));
    this.setExtendedState(3);

    // create tree and root node
    this.tree = new JTree();
    final DefaultMutableTreeNode ROOT = new DefaultMutableTreeNode("Html Document");

    // create model
    DefaultTreeModel treeModel = new DefaultTreeModel(ROOT);
    tree.setModel(treeModel);

    // add scrolling tree to window
    this.add(new JScrollPane(tree));

    // parse document (can be cleaned too)
    Document doc = Jsoup.parse(test);
    // Cleaner cleaner = new Cleaner(Whitelist.simpleText());
    // doc = cleaner.clean(doc);

    // walk the document tree recursivly
    traverseRecursivly(doc.getAllElements().first(), ROOT);

    this.expandAllNodes(tree);
    this.pack();
    this.setLocationRelativeTo(null);
    this.setVisible(true);
}

private void traverseRecursivly(Node docNode, DefaultMutableTreeNode treeNode) {
    // iterate child nodes:
    for (Node nextChildDocNode : docNode.childNodes()) {
        // create leaf:
        DefaultMutableTreeNode nextChildTreeNode = new DefaultMutableTreeNode(nextChildDocNode.nodeName());
        // add child to tree:
        treeNode.add(nextChildTreeNode);
        // do the same for this child's child nodes:
        traverseRecursivly(nextChildDocNode, nextChildTreeNode);
    }
}

// can be removed ...
private void expandAllNodes(JTree tree) {
    int j = tree.getRowCount();
    int i = 0;
    while (i < j) {
        tree.expandRow(i);
        i += 1;
        j = tree.getRowCount();
    }
}

Answer 2

很抱歉，但这在许多层面都是错误的。

首先，解析html / xml并不容易。而你当前获得他们的代码太过于天真。你不是自己做这样的事情，而是更好尝试使用一些现有的库来为你做解析。做到这一点对你来说已经够难了。（以正确和健壮方式完成解析的可能性接近于零）

然后：而不是专注于＆＃34;复杂＆＃34;任务...我宁愿建议您首先关注编程的一些工艺方面。例如：你的代码几乎是不可测试的（因为它在那个糟糕的构造函数方法中做了一切）。阅读也比应该阅读困难得多。

我的（个人）推荐：

了解编写可测试代码（请参阅here）
了解如何使用TDD并使用JUnit进行单元测试
了解＆＃34;清洁代码＆＃34;阅读罗伯特·马丁的那本书

换句话说：您似乎希望花费精力解决复杂的问题。但是为了以有效，持久的方式做到这一点......你缺乏非常基本的技能。当你编写解决某些问题的代码时，它并没有多大帮助......当代码质量很差时！我知道，这听起来不是很有趣＆＃34;有趣的＆＃34 ;;但请相信我：做TDD＆＃34;正确的方式＆＃34;是非常奖励活动！

使用Java

2 个答案: