鉴于此HTML:
<div id="cat-product-list" alt1="356623" class="product-list list_all_items_price price_new"><span id="wholesale_11_member_price" class="index-price special_price final_price" price="US$5.25"><strong class="final_price_strong">US$5.25</strong><b class="show_vip">(vip)</b></span><span id="wholesale_12_member_price" class="index-price special_price final_price" price="US$4.90" style="display: none"><strong class="final_price_strong">US$4.90</strong><b class="show_vip">(vip)</b></span><span id="wholesale_13_member_price" class="index-price special_price final_price" price="US$4.55" style="display: none"><strong class="final_price_strong">US$4.55</strong><b class="show_vip">(vip)</b></span><span id="wholesale_14_member_price" class="index-price special_price final_price" price="US$4.20" style="display: none"><strong class="final_price_strong">US$4.20</strong><b class="show_vip">(vip)</b></span><span id="shop_price_member_price_on" class="index-price shop_price" price="US$7.00"><strike>US$7.00</strike></span></div>
&#13;
我正在尝试选择div中的第一个span,然后获得强大的值。到目前为止,我成功地抓住了其他东西,但是为此,我无法完成它:
Document d = Jsoup.connect("http://www.emmacloth.com/Clothing-vc-7061.html?icn=clothing&ici=ec_navbar05").timeout(6000).get();
Elements elements = d.select("div#productsContent1_goods.products_category");
for (Element element: elements.select("div.box-product-list.list_all_items")){
System.out.println("start");
String productImage = element.select("div.goods_aImg a img").attr("src");
String productname = element.select("div.goods_mz a").attr("title");
String productUrl = "http://www.emmacloth.com" + element.select("div.goods_mz a").attr("href");
// String productPrice = element.select("div.product-
list.list_all_items_price.price_new >span.index-price.special_price.final_price").toString();
Elements priceElements = element.select(
"div.product-list.list_all_items_price.price_new > span.index-price.special_price.final_price"
);
for (Element priceElement : priceElements) {
System.out.println(priceElement.attr("price"));
}
// System.out.println(productPrice);
}
}
答案 0 :(得分:0)
在这个div
中,您正在寻找具有以下类别的span
:index-price special_price final_price
并且从那里(我想想)您要提取price
。
鉴于您的问题中提供的html,以下代码......
String html = "<div id=\"cat-product-list\" alt1=\"356623\" class=\"product-list list_all_items_price price_new\">" +
"<span id=\"wholesale_11_member_price\" class=\"index-price special_price final_price\" price=\"US$5.25\">" +
"<strong class=\"final_price_strong\">US$5.25</strong>" +
"<b class=\"show_vip\">(vip)</b>" +
"</span>" +
"<span id=\"wholesale_12_member_price\" class=\"index-price special_price final_price\" price=\"US$4.90\" style=\"display: none\">" +
"<strong class=\"final_price_strong\">US$4.90</strong>" +
"<b class=\"show_vip\">(vip)</b>" +
"</span>" +
"<span id=\"wholesale_13_member_price\" class=\"index-price special_price final_price\" price=\"US$4.55\" style=\"display: none\">" +
"<strong class=\"final_price_strong\">US$4.55</strong>" +
"<b class=\"show_vip\">(vip)</b>" +
"</span>" +
"<span id=\"wholesale_14_member_price\" class=\"index-price special_price final_price\" price=\"US$4.20\" style=\"display: none\">" +
"<strong class=\"final_price_strong\">US$4.20</strong>" +
"<b class=\"show_vip\">(vip)</b>" +
"</span>" +
"<span id=\"shop_price_member_price_on\" class=\"index-price shop_price\" price=\"US$7.00\"><strike>US$7.00</strike></span>" +
"</div>";
Document doc = Jsoup.parse(html);
// this selector selects the div(s) having classes: "product-list list_all_items_price price_new"
// and within that div, it selects the span(s) having the classes: "index-price special_price final_price"
Elements priceElements = doc.select(
"div.product-list.list_all_items_price.price_new > span.index-price.special_price.final_price"
);
for (Element priceElement : priceElements) {
System.out.println(priceElement.attr("price"));
}
...将打印出产品价格:
US$5.25
US$4.90
US$4.55
US$4.20
回应他的评论:
或某些原因,它不适用于整个网站,你能查看我修改过的问题
以下代码......
Document d =
Jsoup.connect("http://www.emmacloth.com/Clothing-vc-7061.html?icn=clothing&ici=ec_navbar05").timeout(6000).get();
for (Element element : d.select("div#productsContent1_goods.products_category > div.box-product-list.list_all_items")) {
System.out.println("start");
String productImage = element.select("div.goods_aImg > a > img").attr("src");
String productname = element.select("div.goods_mz > a").attr("title");
String productUrl = "http://www.emmacloth.com" + element.select("div.goods_mz > a").attr("href");
System.out.println(productImage);
System.out.println(productname);
System.out.println(productUrl);
}
..将打印:
http://img.ltwebstatic.com/images/pi/201710/3b/15090086488079557831_thumbnail_220x293.jpg
Pearl Embellished Bow Tied Bell Cuff Blouse
http://www.emmacloth.com/Pearl-Embellished-Bow-Tied-Bell-Cuff-Blouse-p-403325-cat-1733.html
... etc
到目前为止,这么好。但是price
呢?如果您查看此网页的来源,您会看到price元素是由该页面上的category_price
JS函数提供的动态内容。因此,该元素不存在静态,因此JSoup无法读取。要阅读动态内容,您必须使用Selenium等网络驱动程序。