JSOUP网站HTML解析:Java

时间:2014-01-30 18:33:46

标签: java web-scraping jsoup

我被困在一个需要解析this网站的地方,并显示Metascore的热门PlayStation 3游戏及其评分。我刚开始使用Jsoup进行开发时,无法使用JSoup进行良好的解析。

我得到了这样的评分和头衔。有更好的方法吗?

Document doc = Jsoup.connect(URL).userAgent("Mozilla").get();
// To get score
Elements links = doc.select("span.metascore_w.medium.game");
// To get title
Elements links = doc.select("h3.product_title");
      for (Element link : links) {
        System.out.println("text : " + link.text());
      }

2 个答案:

答案 0 :(得分:2)

您可以看到的另一种方法是为您需要的两个标记(例如div.main_stats)寻找重复的父级,并迭代它以收集元素:

Elements parents = doc.select("div.main_stats");
for (Element child : parents) {
    Element label = child.select("h3.product_title").first();
    Element score = child.select("span.metascore_w.medium.game").first();
System.out.println("Game **" + label.text()+ "** has a Metascore of ->> " + score.text());

}

<强>输出:

Game **XCOM: Enemy Within** has a Metascore of ->> 88
Game **Minecraft: PlayStation 3 Edition** has a Metascore of ->> 86
Game **Gran Turismo 6** has a Metascore of ->> 81
Game **Need for Speed: Rivals** has a Metascore of ->> 80

答案 1 :(得分:0)

我想出了这段代码:

Element div = doc.select("ol.list_products.list_product_summaries").first(); 
      for (Element element : div.children()) {
        System.out.println(element.select("span.metascore_w.medium.game").text());
        System.out.println(element.select("h3.product_title").text());
      }