如何使用JSoup以正确的顺序迭代文本和属性的HTML。
<a href="link1"> text child 1</a>
own text 1
<b> text child 2</b>
own text 2
我想为每个属性/文本做一些处理。 例如最终输出可能如下所示: -
1) text child 1 (is a link)
2) own text 1
3) text child 2 (is bold)
4) own text 2
目前,我可以迭代子元素
Elements elements = element.children(); //gives my child 1 and 2;
for(element e: elements){
//... do processing plus extract childText...
}
或获取OwnText,但我不知道如何一起做两件事。
String text = element.ownText(); // gives me own text 1 and 2;
另外,我不想使用(因为行信息丢失)
String text =element.Text();
如何迭代元素以便我可以获得
child 1 -> text 1 -> child 2 -> text 2 (where text 1 and 2 are separated)
答案 0 :(得分:1)
如果你的HTML不是很复杂,你可以使用:
public static void main(String[] args) {
try {
Document document = Jsoup
.parse("<a href=\"link1\"> text child 1</a>\r\n" + "own text 1\r\n" + "<b> text child 2</b>\r\n" + "own text 2");
handleElement(document.body());
} catch (Exception e) {
e.printStackTrace();
}
}
public static void handleElement(Node parent) {
if (parent instanceof TextNode) {
System.out.println(((TextNode) parent).text());
}
for (Node node : parent.childNodes()) {
handleElement(node);
}
}
如果它更复杂,你可以递归遍历元素树:
int counter = 1;
for (Node node : document.body().childNodes()) {
if (node instanceof TextNode) {
System.out.println(counter++ + ") " + ((TextNode) node).text().trim());
} else if (node instanceof Element) {
Element element = (Element) node;
String suffix = "";
if ("a".equals(element.tagName())) {
suffix = " (is a link)";
} else if ("b".equals(element.tagName())) {
suffix = " (is bold)";
}
System.out.println(counter++ + ") " + element.ownText() + suffix);
}
}
此代码打印出您所描述的内容:
<script>
var nTable = "";
$(document).ready(function () {
@foreach (var item in Model.NewsModel)
{
@:nTable = "<tr class=\"news1\">";
@:nTable += "<td>";
@:nTable += "<input id=\"test1\" value="+ item.NewsNo +" class=\"form-control\" />";// This is not working / item.NewsNo has a value.
@:nTable += "<input id=\"test1\" value=\"Helloooo\" class=\"form-control\" />";// This is working with hardcoded values
@:nTable += "<td>";
@:nTable += "</tr>";
}
document.getElementById('GlobeNews').innerHTML = nTable;
});
</script>
<div class="panel panel-success">
<div class="panel-heading"><u><b>Total News</b></u></div>
<div class="panel-body utiDIv">
<table id="GlobeNews">
<tr></tr>
</table>
</div>
</div>
1)文本子1(是链接)
2)自己的文本1
3)文字儿童2(粗体)
4)自己的文本2