我在以下HTML代码中获取文本时遇到了一些麻烦,我需要一些帮助。
<div class="itemlist">
<ul>
<li>
<div class="Description">
<h2>Item 1</h2> // GET THIS
<h3 title="Shipping :01-02 Nov">Shipping :01-02Nov</h3> // GET THIS
</div>
<div class="price" style="margin: 0px auto; display: none;">
<span class="arial-12-88" style="display: inline;"></span>
<div class="currency-USD arial-24-26-bold">450 USD</div> // GET THIS
<span class="arial-12-d0" style="display: inline;"></span>
</div>
<div class="button_set" style="display: flex;">
<a href="productDetail.htm?pid=00020170918214914392zGPQW7nE06A2"><button class="learn">Learn More</button></a>
<a href="user/orderDetails.htm?m=add&pid=00020170918214914392zGPQW7nE06A2&count=1&fitting=">
<button class="add">Add To Cart</button></a> // GET THIS
</div>
</li>
next item ...
</ul>
</div>
The output should be:
Item 1
Shipping :01-02Nov
450 USD
我的方法过于静态,无法处理项目结构的变化。因为不是每个项目都有例如同一ChildNumber的价格。唯一相同的东西是div类名。
我现在使用调试器来查找我必须调用的孩子:
Element content = doc.getElementsByClass("itemlist").first();
Node child1 = content.childNode(1);
for (Node node : child1.childNodes()) {
try {
Node desc = node.childNode(3);
Node price = node.childNode(5);
Node stock = node.childNode(7);
// get description
Node desc_elem = desc.childNode(1);
Node desc_text = desc_elem.childNode(0);
String desc_txt = ((TextNode) desc_text).text().trim();
} catch (Exception e) {
continue;
}
请帮我找一个更有活力的方法。理想的是获得所有列表项并循环它们。然后打电话来获取div描述,div价格。然后,我可以阅读孩子的文字。
答案 0 :(得分:1)
//select the div with the item list
Element itemlist = doc.select("div.itemlist").first();
// select each li element
Elements items = itemlist.select("li");
// for each li element select the corresponding div with item name, shipping info and price
for(Element e : items){
System.out.println(e.select("div.Description h2").text());
System.out.println(e.select("div.Description h3").text());
System.out.println(e.select("div.currency-USD").text());
}