Question

我正在抓这个网站。

http://www.davidsassoonlibrary.com/index.php?action=book_details

但无论我搜索哪本书，网址都保持不变。我对网络抓取完全陌生。我已经使用Jsoup抓了两页，并试图对这个网站做同样的事情。

任何人都有任何想法。请尽可能详细地解释。三江源

Answer 1

您应该使用帖子请求来搜索带有一些参数搜索和标题的网页，请尝试下面的代码：

Document doc = Jsoup.connect("http://www.davidsassoonlibrary.com/index.php?action=book_details")
  .data("search", "search")
  .data("title", "Test Cricket Lists")
  //fields which are being passed in post request.
  .userAgent("Mozilla")
  .post();
   System.out.println(doc); // will print html source

这是控制台上的结果：

您可以使用firebug来查询URL被调用的方法，方法GET或POS和参数。

刮取网址未更改的网站

1 个答案: