用Cheerio刮痧

时间:2014-06-19 23:59:46

标签: jquery node.js web-scraping screen-scraping cheerio

我正在编写一个刮刀来获取psp iso文件,以便根据评级下载。我很难针对每个评级。我怎么能抓住这个元素?我已经包含了一个快照供参考。 rating元素位于tr td标记内。

var request = require('request'),
  cheerio = require('cheerio'),
  fs = require('fs');

var url = 'http://goo.gl/cc4HRc',
  pspGames = [];

request(url, function (error, response, html) {
  if (!error && response.statusCode === 200) {
    var $ = cheerio.load(html);
    $('.gamelist', 'td').each(function () {
      var links = $(this).attr('href');
      pspGames.push(links);
    });
   }
});

enter image description here

2 个答案:

答案 0 :(得分:1)

查看链接,看起来像这样:

<tr>
  <td>
    <a class="index gamelist" title="Corpse Party - Book of Shadows (Japan) ISO Info and Download" href="/Sony_Playstation_Portable_ISOs/Corpse_Party_-_Book_of_Shadows_(Japan)/158702">Corpse Party - Book of Shadows (Japan)</a>
  </td>
  <td align="center">4.9504</td>
</tr>

您应该这样做:$('.gamelist').each(

答案 1 :(得分:1)

我不确定你将如何存储评级,但也许这样的事情会有所帮助:

$('.gamelist').each(function () {
    var link = $(this.attr('href'));
    var rating = $(this).parent().siblings().first().text();
    pspGames.push({"link": link, "rating": rating});
});