WebClient(htmlunit)没有看到一些元素

时间:2017-08-16 17:34:22

标签: java html web-scraping htmlunit

我正在尝试使用" page.asText()"来解析蒸汽市场的网页,但这不起作用。这可能是因为在1秒内加载html后没有加载项目。

public static void main(String[] args) throws Exception{
            java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF);
            java.util.logging.Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.OFF);
            String link="http://steamcommunity.com/market/search?appid=730#p6_price_asc";
            HtmlPage page;
            WebClient webClient = new WebClient(BrowserVersion.CHROME);
            page = (HtmlPage) webClient.getPage(link);
            System.out.println(page.asText());
            }

在控制台中我看到:

Show advanced options...






 < 1 2 3 4 5 6 ... 939 >
 Showing 1-10 of 9389 results

需要:

Show advanced options...
PRICE
QUANTITY
NAME
31,218
 Starting at:
 $0.35 USD
Operation Hydra Case 
 Counter-Strike: Global Offensive
 276,582
 Starting at:
 $0.23 USD
.
.
.

M4A1-S | Decimator (Field-Tested) 
 Counter-Strike: Global Offensive


 232
 Starting at:
 $27.06 USD

AWP | Asiimov (Battle-Scarred) 
 Counter-Strike: Global Offensive


 28,068
 Starting at:
 $0.75 USD

Krakow 2017 Legends Autograph Capsule 
 Counter-Strike: Global Offensive


 < 1 2 3 4 5 6 ... 940 >
 Showing 1-10 of 9392 results

1 个答案:

答案 0 :(得分:0)

首先,确保启用了javascript。

webClient.getOptions.setJavaScriptEnabled(true);

我通常做的是等待加载更多元素:

thread.sleep(3000);

这使页面加载所有其他内容3秒。

您还可以尝试其他用户在此处列出的任何其他方法:

HTMLUnit doesn't wait for Javascript