我正在使用puppeteer来评估我的测试应用中基于javascript的网页HTML。
这是我用来确保加载所有数据的行:
await page.setRequestInterception(true);
page.on("request", (request) => {
if (request.resourceType() === "image" || request.resourceType() === "font" || request.resourceType() === "media") {
console.log("Request intercepted! ", request.url(), request.resourceType());
request.abort();
} else {
request.continue();
}
});
try {
await page.goto(url, { waitUntil: ['networkidle0', 'load'], timeout: requestCounterMaxWaitMs });
} catch (e) {
}
这是等待 ajax请求完成的最佳方法吗?
感觉不对,但我不确定是否应该使用networkidle0,networkidle1等?
答案 0 :(得分:2)
XHR本质上可以在应用程序的后期出现。如果应用程序在例如1秒之后发送XHR并且您想等待它,则任何networkidle0
都无法帮助您。我想如果你想“正确地”做到这一点,你应该知道你在等待什么,await
。
以下是应用程序中稍后发生XHR的示例,它等待所有这些:
const puppeteer = require('puppeteer');
const html = `
<html>
<body>
<script>
setTimeout(() => {
fetch('https://swapi.co/api/people/1/');
}, 1000);
setTimeout(() => {
fetch('https://www.metaweather.com/api/location/search/?query=san');
}, 2000);
setTimeout(() => {
fetch('https://api.fda.gov/drug/event.json?limit=1');
}, 3000);
</script>
</body>
</html>`;
// you can listen to part of the request
// in this example I'm waiting for all of them
const requests = [
'https://swapi.co/api/people/1/',
'https://www.metaweather.com/api/location/search/?query=san',
'https://api.fda.gov/drug/event.json?limit=1'
];
const waitForRequests = (page, names) => {
const requestsList = [...names];
return new Promise(resolve =>
page.on('request', request => {
if (request.resourceType() === "xhr") {
// check if request is in observed list
const index = requestsList.indexOf(request.url());
if (index > -1) {
requestsList.splice(index, 1);
}
// if all request are fulfilled
if (!requestsList.length) {
resolve();
}
}
request.continue();
})
);
};
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
// register page.on('request') observables
const observedRequests = waitForRequests(page, requests);
// await is ignored here because you want to only consider XHR (ajax)
// but it's not necessary
page.goto(`data:text/html,${html}`);
console.log('before xhr');
// await for all observed requests
await observedRequests;
console.log('after all xhr');
await browser.close();
})();
答案 1 :(得分:2)
您可以使用pending-xhr-puppeteer,这是一个公开承诺的库,等待所有待处理的xhr请求得到解决。
像这样使用它:
const puppeteer = require('puppeteer');
const { PendingXHR } = require('pending-xhr-puppeteer');
const browser = await puppeteer.launch({
headless: true,
args,
});
const page = await browser.newPage();
const pendingXHR = new PendingXHR(page);
await page.goto(`http://page-with-xhr`);
// Here all xhr requests are not finished
await pendingXHR.waitForAllXhrFinished();
// Here all xhr requests are finished
免责声明:我是pending-xhr-puppeteer的维护者
答案 2 :(得分:1)
我同意this answer中的观点,即等待 all 网络活动停止(“所有数据均已加载”)是一个相当模糊的概念,它完全取决于行为您要抓取的网站。
用于检测响应的选项包括等待固定的持续时间,网络流量空闲后的固定持续时间,特定的响应(或一组响应),元素出现在页面上,谓词返回true等。全部Puppeteer supports。
考虑到这一点,最典型的情况是,您正在等待来自已知(或使用模式或前缀的部分已知)资源URL的某些特定响应或一组响应,这些URL将提供有效负载您想要读取和/或触发您需要检测的DOM交互。 Puppeteer为此提供了page.waitForResponse
。
下面是一个基于existing post的示例(并展示了如何在响应时从响应中检索数据):
{{1}}