在nodejs中使用cheerio进行网页报废?

时间:2016-01-23 06:45:13

标签: javascript node.js http web-scraping cheerio

我正在尝试使用节点js中的cheerio和http进行网页剪辑

部分HTML代码:

<tr>
    <td id="priceblock_saleprice_lbl" class="a-color-price a-size-base a-text-right a-nowrap">Sale:</td>

    <td class="a-span12">
    <span id="priceblock_saleprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 585.00</span>

    </td>
</tr>

nodejs代码:

var sale_price = '#priceblock_saleprice';
            scraper(sale_price).filter(function(){
            var data_price = scraper(this);
            console.log(data_price.text());
            scraped = scraped + data_price.text()+';';
          });

 this code is giving 585 as output.

但同样如此:

html页面的一部分:

<tr id="priceblock_ourprice_row">
    <td id="priceblock_ourprice_lbl" class="a-color-secondary a-size-base a-text-right a-nowrap">Price:</td>
    <td class="a-span12">
        <span id="priceblock_ourprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 329.00</span>
    </td>
</tr>

nodejs代码:

var mrp  = '#priceblock_ourprice_lbl';
scraper(mrp).filter(function(){
            var data_mrp = scraper(this);
            console.log(data_mrp.text());
            scraped = scraped + data_mrp.text()+';';
          });

它没有给出输出。

2 个答案:

答案 0 :(得分:0)

您使用了错误的ID ...应该是priceblock_ourprice

var mrp  = '#priceblock_ourprice';
scraper(mrp).filter(function(){
   var data_mrp = scraper(this);
   console.log(data_mrp.text());
   scraped = scraped + data_mrp.text()+';';
});

答案 1 :(得分:0)

第二个代码段中使用的id指向第一个def clip_segment_v3_plane_n(p1, p2, planes): """ - p1, p2: pair of 3d vectors defining a line segment. - planes: a sequence of (4 floats): `(x, y, z, d)`. Returns 2 vector triplets (the clipped segment) or (None, None) then segment is entirely outside. """ dp = sub_v3v3(p2, p1) p1_fac = 0.0 p2_fac = 1.0 for p in planes: div = dot_v3v3(p, dp) if div != 0.0: t = -plane_point_side_v3(p, p1) if div > 0.0: # clip p1 lower bounds if t >= div: return None, None if t > 0.0: fac = (t / div) if fac > p1_fac: p1_fac = fac if p1_fac > p2_fac: return None, None elif div < 0.0: # clip p2 upper bounds if t > 0.0: return None, None if t > div: fac = (t / div) if fac < p2_fac: p2_fac = fac if p1_fac > p2_fac: return None, None p1_clip = add_v3v3(p1, mul_v3_fl(dp, p1_fac)) p2_clip = add_v3v3(p1, mul_v3_fl(dp, p2_fac)) return p1_clip, p2_clip # inline math library def add_v3v3(v0, v1): return ( v0[0] + v1[0], v0[1] + v1[1], v0[2] + v1[2], ) def sub_v3v3(v0, v1): return ( v0[0] - v1[0], v0[1] - v1[1], v0[2] - v1[2], ) def dot_v3v3(v0, v1): return ( (v0[0] * v1[0]) + (v0[1] * v1[1]) + (v0[2] * v1[2]) ) def mul_v3_fl(v0, f): return ( v0[0] * f, v0[1] * f, v0[2] * f, ) def plane_point_side_v3(p, v): return dot_v3v3(p, v) + p[3] 元素,但您需要定位第二个<td>元素,因此请使用 “#priceblock_ourprice”