我最终尝试制作一个JSON文件,其中将包含Google Maps Reviews的所有结果,但我只能输出一个/最新的评论...
任何人都可以帮助我将其整理成阵列以获取所有评论吗?
const puppeteer = require('puppeteer');
let scrape = async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.goto('https://www.google.com/maps/place/Microsoft/@36.1275216,-115.1728651,17z/data=!3m2!4b1!5s0x80c8c416a26be787:0x4392ab27a0ae83e0!4m7!3m6!1s0x80c8c4141f4642c5:0x764c3f951cfc6355!8m2!3d36.1275216!4d-115.1706764!9m1!1b1');
await page.waitFor(1000);
const result = await page.evaluate(async () => {
let fullName = document.querySelector('.section-review-title').innerText;
let postedDate = document.querySelector('.section-review-publish-date').innerText;
let starRating = document.querySelector('.section-review-stars').getAttribute("aria-label");
let review = document.querySelector('.section-review-text').innerText;
return {
fullName,
postedDate,
starRating,
review
}
});
browser.close();
return result;
};
scrape().then((value) => {
console.log(value); // Success!
});
谢谢!
答案 0 :(得分:3)
通常document.querySelectorAll
会为您提供所有结果,而不仅仅是第一个。
针对您的用例,您要做的是首先处理所有评论(在处理它们之前)。
我检查了您提供的网址,并且将以这种方式启动(木偶风格):
await page.$$('.section-review-content')
将返回一个承诺,该承诺将解析为一个数组,其中所有评论均作为ElementHandles。
然后,您遍历数组并在每个ElementHandle上进行如下操作:
await ElementHandle.$eval('.section-review-title', el => el.innerText)
例如,在您的抓取函数中,您将拥有(我稍微缩短了场景):
...
await page.goto('https://www.google.com/maps/place/Microsoft/@36.1275216,-115.1728651,17z/data=!3m2!4b1!5s0x80c8c416a26be787:0x4392ab27a0ae83e0!4m7!3m6!1s0x80c8c4141f4642c5:0x764c3f951cfc6355!8m2!3d36.1275216!4d-115.1706764!9m1!1b1');
await page.waitFor(1000);
const reviews = await page.$$(".section-review-content");
for (const review of reviews) {
const reviewTitle = await review.$eval(
".section-review-title",
div => div.innerText
);
console.log('\n' + reviewTitle);
}
...