我正在尝试创建一个基本脚本以仅向下滚动到hacker news网站的底部。滚动实现取自this这样的问题(kimbaudi的第二个回答,第一种方法)。
该实现通过滚动时不断测量元素列表(由.length
提供)中的selector
来确定浏览器是否已成功滚动到该列表的底部元素。
对于我的selector
,我选择了HTML元素来容纳有关黑客新闻的各篇文章tr.athing
,目的是向下滚动到最底部的文章链接。相反,即使tr.athing
作为selector
可以打印(如下面的代码所示),我也会收到以下错误:
Error: Error: failed to find element matching selector "tr.athing:last-child"
有人可以帮助我了解问题所在吗?
const puppeteer = require("puppeteer");
const cheerio = require('cheerio');
const link = 'https://news.ycombinator.com/';
// 2 functions used in scrolling
async function getCount(page) {
await console.log(page.$$eval("tr.athing", a => a.length));
return await page.$$eval("tr.athing", a => a.length);
}
async function scrollDown(page) {
await page.$eval("tr.athing:last-child", e => {
e.scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
});
}
// puppeteer usage as normal
puppeteer.launch({ headless: false }).then(async browser => {
const page = await browser.newPage();
const navigationPromise = page.waitForNavigation();
await page.setViewport({ width: 1500, height: 800 });
// Loading page
await page.goto(link);
await navigationPromise;
await page.waitFor(1000);
// Using cheerio to inject jquery into page.
const html = await page.content();
const $ = await cheerio.load(html);
// This works
var selection = $('tr.athing').text();
await console.log('\n');
await console.log(selection);
await console.log('\n');
// Error, this does not work for some reason;
// scrolling code starts here.
const delay = 10000;
let preCount = 0;
let postCount = 0;
do {
preCount = getCount(page);
scrollDown(page);
page.waitFor(delay);
postCount = getCount(page);
} while (postCount > preCount);
page.waitFor(delay);
// await browser.close();
})
答案 0 :(得分:0)
last-child选择器不会为您提供父元素的最后一个元素,而是最后一个元素。
:last-child选择器匹配作为其父级的最后一个子级的每个元素。
您可以改为执行以下操作:
async function scrollDown(page) {
await page.$$eval("tr.athing", els => {
els[els.length -1].scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
});
}
还请注意,您的代码中还有许多缺失的等待事项
do {
preCount = await getCount(page);
await scrollDown(page);
await page.waitFor(delay);
postCount = await getCount(page);
} while (postCount > preCount);
await page.waitFor(delay);