我使用puppeteer从页面上抓取资源,但是由于连接超时,请求之一未能成功,并且它长时间阻塞了page.goto('url')
函数。我想跳过此请求,然后继续请求下一个。我需要为每个请求设置超时,但没有为page.goto
函数设置总超时选项。
以下是我的代码test.js:
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('request', request => {
console.log(request.url())
})
await page.goto(process.argv[2], {timeout: 10000}).then( () => {
}, () => {
console.log("timeout");
});
browser.close();
node test.js http://ipv6ready.wanwuyunlian.com:8080/
http://ipv6ready.wanwuyunlian.com:8080/
http://ipv6ready.wanwuyunlian.com:8080/js/bootstrap.min.js
http://ipv6ready.wanwuyunlian.com:8080/js/echarts/echarts.min.js
https://www.google-analytics.com/analytics.js
http://ipv6ready.wanwuyunlian.com:8080/js/echarts/macarons.js
https://www.google-analytics.com/analytics.js
由于连接超时,analytics.js
请求非常慢;这将长期阻塞page.goto
,将不会再请求剩余的资源,我想中止此请求并继续请求剩余的资源。
答案 0 :(得分:2)
有两种方法可以解决此问题。第一种是使用with open("output.txt", "w") as out_file:
for line in readcontent:
out_file.write(line)
if re.match('Can you hear me?', line.strip()):
out_file.write('\n')
(“至少在500毫秒内不超过2个网络连接时,请考虑完成导航”),而不是使用默认的networkidle2
,这样最多可以请求两个请求慢而不会影响您的代码:
networkidle0
或者,要对单个页面请求实施超时,我建议使用诸如const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(process.argv[2], {waitUntil: "networkidle2"}).then( () => {
}, (e) => {
console.error("Error", e);
});
browser.close();
之类的超时模块:
p-timeout
您需要编写const pTimeout = require("p-timeout");
const shorterTimeout = 10000;
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', async (request) => {
if (!shouldImplementTimeout(request.url())) {
await request.continue();
}
await pTimeout(request.continue(), shorterTimeout)
.catch((e) => {
console.error(request.url(), "failed:", e);
await request.abort("timedout");
});
})
await page.goto(process.argv[2]).then( () => {
}, (e) => {
console.error("Error", e);
});
browser.close();
,如果请求需要更短的超时,则应返回shouldImplementTimeout
。
答案 1 :(得分:1)
如果您只想根据请求的URL取消请求,则在伪造者中有一种模式:page.setRequestInterception。来自您的用例的文档样本:
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
// turn on requests intercepting and cancellation capability
await page.setRequestInterception(true);
page.on('request', interceptedRequest => {
console.log(interceptedRequest.url());
if (interceptedRequest.url().includes("google-analytics.com"))
{
console.log("cancelled!");
interceptedRequest.abort();
}
else
{
interceptedRequest.continue();
}
});
await page.goto('http://ipv6ready.wanwuyunlian.com:8080/');
await browser.close();
});