抓取NodeJS时出现分页问题

时间:2019-12-15 20:57:10

标签: javascript arrays node.js cheerio

我正在写一个小脚本,从公共目录中抓取一些信息。我已将其保存为CSV,但是我无法自动执行分页。

我的来源是:

const rp = require('request-promise');
const request = ('request');
const otcsv = require('objects-to-csv');
const cheerio = require('cheerio');

// URL To scrape
const baseURL = 'xx';
const searchURL = 'xxx';

// scrape info
const getCompanies = async () => {
    // Pagination test 

    for(let index = 0; index <= 2; index = index + 1) {
        const html = await request.get("xxx" + index);
        const $ = await cheerio.load(html);
        console.log("Loading Pages....");
        // console.log("At page number" + index);
        // end pagination test
        const htmls = await rp(baseURL + searchURL);
        const businessMap = cheerio('a.business-name', htmls).map(async (i, e) => {
            const link = baseURL + e.attribs.href;
            const innerHtml = await rp(link);
            const emailAddress = cheerio('a.email-business', innerHtml).prop('href');
            const name = e.children[0].data || cheerio('h1', innerHtml).text();
            const phone = cheerio('p.phone', innerHtml).text();

            return {
                emailAddress: emailAddress ? emailAddress.replace('mailto:', '') : '',
                //  link,
                name,
                phone,
            }

        }).get();
        return Promise.all(businessMap);
    }
};

// save to CSV
getCompanies()
  .then(result => {
    const transformed = new otcsv(result);
    return transformed.toDisk('./output.csv');
  })
  .then(() => console.log('SUCCESSFULLY COMPLETED THE WEB SCRAPING SAMPLE'));

出现的错误是 request.get不是函数。

编辑

此问题的第二部分位于:Nodejs Scraper isn't moving to next page(s)

1 个答案:

答案 0 :(得分:1)

request.get应该为rp.get,因为request模块不会返回Promise

无论如何,您都会遇到错误,因为您不是require正在request,而只是将string分配给request变量:

const request = ('request');

将其更改为:

const request = require('request');

由于您正在使用Promises,所以我建议只要求request-promise

const request = require('request-promise');