首先发布在这里,所以我会尽力保持这一点。我一直在使用Jsoup从一系列网页中提取数据以引入一个优秀的应用程序。我遇到了一个页面,它根据下拉框中的用户选择动态更新数据。当我在Chrome中检查html时,我可以看到数据,但我似乎无法提取它。我可以提取它周围的所有文本元素,但动态生成的任何内容都不会出来。
我正在查看的页面有以下表格类别,为包装道歉,我无法摆脱它。
<form class="variations_form cart" method="post" enctype="multipart/form-data" data-product_id="8044" data-product_variations="[{"variation_id":8047,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":19.70,"display_regular_price":19.70,"attributes":{"attribute_size":"500g"},"image_src":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-475x652.png","image_link":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann.png","image_title":"LABELS_500g-FOOD Vann","image_alt":"","image_srcset":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann.png 1063w","image_sizes":"(max-width: 475px) 100vw, 475px","price_html":"<span class=\"price\"><span class=\"amount\">$19.70<\/span><\/span>","availability_html":"","sku":"FOOD-Vanilla-500","weight":".5 kg","dimensions":"","min_qty":1,"max_qty":"","backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":"<p>500g<\/p>\n"},{"variation_id":8045,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":13.50,"display_regular_price":13.50,"attributes":{"attribute_size":"1kg"},"image_src":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-475x652.png","image_link":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van.png","image_title":"LABELS_1kg-FOOD Van","image_alt":"","image_srcset":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van.png 1063w","image_sizes":"(max-width: 475px) 100vw, 475px","price_html":"<span class=\"price\"><span class=\"amount\">$13.50<\/span><\/span>","availability_html":"","sku":"FOOD-Vanilla-1kg","weight":"1 kg","dimensions":"","min_qty":1,"max_qty":"","backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":"<p>1kg<\/p>\n"},{"variation_id":8046,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":199.95,"display_regular_price":199.95,"attributes":{"attribute_size":"3kg"},"image_src":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-475x652.png","image_link":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van.png","image_title":"LABELS_3kg-FOOD Van","image_alt":"","image_srcset":"http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van.png 1063w","image_sizes":"(max-width: 475px) 100vw, 475px","price_html":"<span class=\"price\"><span class=\"amount\">$199.95<\/span><\/span>","availability_html":"","sku":"FOOD-Vanilla-3kg","weight":"3 kg","dimensions":"","min_qty":1,"max_qty":"","backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":"<p>3kg<\/p>\n"}]">
<table class="variations" cellspacing="0">
<tbody>
<tr>
<td class="label">
<label for="size">Size</label>
</td>
<td class="value">
<select id="size" class="" name="attribute_size" data-attribute_name="attribute_size">
<option value="">Choose an option</option>
<option value="500g">500g</option>
<option value="1kg" selected="selected">1kg</option>
<option value="3kg">3kg</option>
</select><a class="reset_variations" href="#" style="visibility: visible; display: block;">Clear selection</a>
</td>
</tr>
</tbody>
</table>
<div class="angelleye_buton_box_relative" style="position: relative;">
<div class="single_variation_wrap">
<div class="woocommerce-variation-description" style="border: 1px solid transparent;">
<p>1kg</p>
</div>
<div class="single_variation"><span class="price"><span class="amount selectorgadget_selected">$13.50</span></span>
</div>
<div class="variations_button">
<div class="quantity">
<input type="number" step="1" name="quantity" value="1" title="Qty" class="input-text qty text" size="4" min="1">
</div>
<button type="submit" class="single_add_to_cart_button button alt">Add to basket</button>
<input type="hidden" name="add-to-cart" value="8044">
<input type="hidden" name="product_id" value="8044">
<input type="hidden" name="variation_id" class="variation_id" value="8045">
</div>
</div>
<div class="blockUI blockOverlay angelleyeOverlay" style="display:none;z-index: 1000; border: none; margin: 0px; padding: 0px; width: 100%; height: 100%; top: 0px; left: 0px; opacity: 0.6; cursor: default; position: absolute; background: url(http://www.sourcewebsite.com/wp-content/plugins/woocommerce/assets/images/select2-spinner.gif) 50% 50% / 16px 16px no-repeat rgb(255, 255, 255);"></div>
</div>
</form>
&#13;
我正试图提取价格&#34; 13.50&#34;从下面的div。
<div class="single_variation"><span class="price"><span class="amount selectorgadget_selected">$13.50</span></span>
</div>
&#13;
我的代码如下:
private class ParseFoodPriceURL extends AsyncTask<String, Void, String> {
@Override
protected String doInBackground(String... strings) {
StringBuffer buffer = new StringBuffer();
try {
Document doc = Jsoup.connect(strings[0]).get();
Elements foodPrice = doc.select("div.single_variation_wrap > div.single_variation");
String priceTextSelection = foodPrice.text();
buffer.append("Price: $" + priceTextSelection);
}
catch (Throwable t) {
t.printStackTrace();
}
return buffer.toString();
}
答案 0 :(得分:1)
JSoup不是浏览器,因此它不会解释和执行JavaScript。如果网站的内容是动态生成的,则无法直接使用JSoup。我想到了两个选择:
直接识别AJAX调用并通过这些调用获取信息。通常,响应不是HTML而是JSON。所以你可能需要其他解析库。此选项很快,但您需要调查并了解网页的工作方式。
将selenium webdriver与真实的浏览器引擎(例如phantomjs)一起使用。这将像真正的浏览器一样加载网站,但您可以访问类似于JSoup的内容。这相对容易编程,但速度慢并且使用了大量资源。如果你在android中运行,这可能太多了。无论如何,Android的正确工具似乎是Selenoid。