我正在写一个小脚本,从公共目录中抓取一些信息。我已将其保存为CSV,但是我无法自动执行分页。
我的来源是:
const rp = require('request-promise');
const request = ('request');
const otcsv = require('objects-to-csv');
const cheerio = require('cheerio');
// URL To scrape
const baseURL = 'xx';
const searchURL = 'xxx';
// scrape info
const getCompanies = async () => {
// Pagination test
for(let index = 0; index <= 2; index = index + 1) {
const html = await request.get("xxx" + index);
const $ = await cheerio.load(html);
console.log("Loading Pages....");
// console.log("At page number" + index);
// end pagination test
const htmls = await rp(baseURL + searchURL);
const businessMap = cheerio('a.business-name', htmls).map(async (i, e) => {
const link = baseURL + e.attribs.href;
const innerHtml = await rp(link);
const emailAddress = cheerio('a.email-business', innerHtml).prop('href');
const name = e.children[0].data || cheerio('h1', innerHtml).text();
const phone = cheerio('p.phone', innerHtml).text();
return {
emailAddress: emailAddress ? emailAddress.replace('mailto:', '') : '',
// link,
name,
phone,
}
}).get();
return Promise.all(businessMap);
}
};
// save to CSV
getCompanies()
.then(result => {
const transformed = new otcsv(result);
return transformed.toDisk('./output.csv');
})
.then(() => console.log('SUCCESSFULLY COMPLETED THE WEB SCRAPING SAMPLE'));
出现的错误是 request.get不是函数。
编辑
此问题的第二部分位于:Nodejs Scraper isn't moving to next page(s)
答案 0 :(得分:1)
request.get
应该为rp.get
,因为request
模块不会返回Promise
。
无论如何,您都会遇到错误,因为您不是require
正在request
,而只是将string
分配给request
变量:
const request = ('request');
将其更改为:
const request = require('request');
由于您正在使用Promises,所以我建议只要求request-promise
const request = require('request-promise');