如何使用JSoup以正确的顺序遍历文本和属性的html

时间:2017-05-03 08:14:42

标签: java jsoup

如何使用JSoup以正确的顺序迭代文本和属性的HTML。

<a href="link1"> text child 1</a>
own text 1
<b> text child 2</b>
own text 2

我想为每个属性/文本做一些处理。 例如最终输出可能如下所示: -

1) text child 1 (is a link)
2) own text 1 
3) text child 2 (is bold)
4) own text 2

目前,我可以迭代子元素

Elements elements = element.children(); //gives my child 1 and 2;
for(element e: elements){ 
    //... do processing plus extract childText... 
}

或获取OwnText,但我不知道如何一起做两件事。

String text = element.ownText(); // gives me own text 1 and 2;

另外,我不想使用(因为行信息丢失)

String text =element.Text(); 

如何迭代元素以便我可以获得

child 1 -> text 1 -> child 2 -> text 2 (where text 1 and 2 are separated)

1 个答案:

答案 0 :(得分:1)

如果你的HTML不是很复杂,你可以使用:

public static void main(String[] args) {
    try {
        Document document = Jsoup
                .parse("<a href=\"link1\"> text child 1</a>\r\n" + "own text 1\r\n" + "<b> text child 2</b>\r\n" + "own text 2");

        handleElement(document.body());
    } catch (Exception e) {
        e.printStackTrace();
    }
}

public static void handleElement(Node parent) {
    if (parent instanceof TextNode) {
        System.out.println(((TextNode) parent).text());
    }
    for (Node node : parent.childNodes()) {
        handleElement(node);
    }
}

如果它更复杂,你可以递归遍历元素树:

int counter = 1;
for (Node node : document.body().childNodes()) {
    if (node instanceof TextNode) {
        System.out.println(counter++ + ") " + ((TextNode) node).text().trim());
    } else if (node instanceof Element) {
        Element element = (Element) node;
        String suffix = "";
        if ("a".equals(element.tagName())) {
            suffix = " (is a link)";
        } else if ("b".equals(element.tagName())) {
            suffix = " (is bold)";
        }
        System.out.println(counter++ + ") " + element.ownText() + suffix);
    }
}

此代码打印出您所描述的内容:

<script>
    var nTable = "";
    $(document).ready(function () {
        @foreach (var item in Model.NewsModel)
        {
            @:nTable = "<tr class=\"news1\">";
            @:nTable += "<td>";
            @:nTable += "<input id=\"test1\" value="+ item.NewsNo +" class=\"form-control\" />";// This is not working / item.NewsNo has a value.
            @:nTable += "<input id=\"test1\" value=\"Helloooo\" class=\"form-control\" />";// This is  working with hardcoded values
            @:nTable += "<td>";
            @:nTable += "</tr>";
        }
        document.getElementById('GlobeNews').innerHTML = nTable;
    });
</script>

<div class="panel panel-success">
    <div class="panel-heading"><u><b>Total News</b></u></div>
    <div class="panel-body utiDIv">
        <table id="GlobeNews">
            <tr></tr>
        </table>

    </div>
</div>
  

1)文本子1(是链接)
  2)自己的文本1
  3)文字儿童2(粗体)
  4)自己的文本2