cheerio用href和span解析h2

时间:2016-04-21 08:33:31

标签: javascript jquery html node.js

这是我要解析的HTML:

      <h2 class="offer-header">

        <a class="offer-title" href="http://address.com/id/2">Item name</a>
    </h2> 

        <div class="offer-price">


        <span class="offer-buy-now   buy-now">
            <span class="statement">
                1 999,00 $


                    <span class="label">buy now</span>

            </span>
        </span>
    </div>
// many the same elements

解析href和链接值没关系。但我要解析价格有问题。我得到了很多空格的输出和\ n。我希望以buy now显示相同的价格。

我的价格样本输出

    2 497,00 $


        buy now




    2 379,00 $


        buy now

代码:

 request(task.url, function(err, resp, body){

                  if(body) {
                    $ = cheerio.load(body);
                    links = $('a.offer-title');
                    $(links).each(function (i, link) {

                      //console.log($(link).attr('href'));
                      var price = $('span.offer-buy-now').text();
                      console.log(price);
                      //items[k] = items[k] || [];
                      //items[k] = new itemParam($(link).text(), 12, k);
                      k++;

                    });

                  }
                  callback();
 });

如何解决?

编辑:

我纠正了foreach循环并且它正常工作。但我有另一个问题。我并不总是得到数据的答案,只有3,4,5调用得到结果。也许我的请求有问题?

router.route('/send')
  .post(function(req, res){

      var url = req.body.url;
      var items = [];
      var k=0;
      var q = async.queue(function(task, callback){

            console.log(task.url);
            if(task.url.length>=1) {

              if (isURL(task.url)) {
                console.log('OK');


                request(task.url, function(err, resp, body){

                  if(body) {
                    $ = cheerio.load(body);
                    links = $('div.offer-info');

                    $(links).each(function (i, link) {

                      console.log($(link).find('a.offer-title').attr('href'));
                      var price = $(link).find('span.offer-buy-now').text().replace(/[^0-9.]/g, "");
                      console.log(price);
                      items[k] = items[k] || [];
                      items[k] = new itemParam($(link).find('a.offer-title').text(),
                        price,$(link).find('a.offer-title').attr('href'), k);
                      k++;

                    });

                  }
                  callback();
                });

              } else {
                errorHandling(res, 401,"Invalid url");
              }
            }else{
                errorHandling(res, 401,"Invalid url");
            }

        }
      );


      q.push({url: url+'&p=1'});

      q.drain = function(errr, p) {
        console.log('all items have been processed' + items.length);
        for (var i=0; i<items.length; i++) {

          console.log(items[i].name + ' |  ' + items[i].id + ' | ' + items[i].price);

        }
        res.sendStatus(200);
      };
  });

2 个答案:

答案 0 :(得分:1)

您可以使用以下方法删除数字以外的所有内容:

var price = $('span.offer-buy-now').text().replace(/[^0-9.]/g, "");

<强>样本:

&#13;
&#13;
var str = "2 497,00 $         buy now";
strreplaced = str.replace(/[^0-9.]/g, "");
alert(strreplaced);
&#13;
&#13;
&#13;

答案 1 :(得分:1)

现在只需使用replace方法删除“立即购买”,然后使用trim()删除空格。

Microsoft.Practices.unity.injectionmember

其他解决方案

或者您可以links = $('a.offer-title'); $(links).each(function(i, link) { //console.log($(link).attr('href')); var price = $('span.offer-buy-now').text().replace('buy now', '').trim(); console.log(price); //items[k] = items[k] || []; //items[k] = new itemParam($(link).text(), 12, k); k++; }); 删除范围.statement内的所有元素,然后就可以获得$('span.statement *').remove();

演示:

text
links = $('a.offer-title');
$(links).each(function(i, link) {

  //console.log($(link).attr('href'));
  $('span.statement *').remove();
  var price = $('span.statement').text().trim();
  console.log(price);
  //items[k] = items[k] || [];
  //items[k] = new itemParam($(link).text(), 12, k);
  k++;

});