如何在page.goto函数

时间:2018-06-30 02:26:28

标签: javascript node.js chromium puppeteer

我使用puppeteer从页面上抓取资源,但是由于连接超时,请求之一未能成功,并且它长时间阻塞了page.goto('url')函数。我想跳过此请求,然后继续请求下一个。我需要为每个请求设置超时,但没有为page.goto函数设置总超时选项。

以下是我的代码test.js:

const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('request', request => {
    console.log(request.url())
})
await page.goto(process.argv[2], {timeout: 10000}).then( () => {
}, () => {
    console.log("timeout");
}); 
browser.close();
node test.js http://ipv6ready.wanwuyunlian.com:8080/

http://ipv6ready.wanwuyunlian.com:8080/  
http://ipv6ready.wanwuyunlian.com:8080/js/bootstrap.min.js
http://ipv6ready.wanwuyunlian.com:8080/js/echarts/echarts.min.js
https://www.google-analytics.com/analytics.js    
http://ipv6ready.wanwuyunlian.com:8080/js/echarts/macarons.js
https://www.google-analytics.com/analytics.js 

由于连接超时,analytics.js请求非常慢;这将长期阻塞page.goto,将不会再请求剩余的资源,我想中止此请求并继续请求剩余的资源。

2 个答案:

答案 0 :(得分:2)

有两种方法可以解决此问题。第一种是使用with open("output.txt", "w") as out_file: for line in readcontent: out_file.write(line) if re.match('Can you hear me?', line.strip()): out_file.write('\n') (“至少在500毫秒内不超过2个网络连接时,请考虑完成导航”),而不是使用默认的networkidle2,这样最多可以请求两个请求慢而不会影响您的代码:

networkidle0

或者,要对单个页面请求实施超时,我建议使用诸如const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(process.argv[2], {waitUntil: "networkidle2"}).then( () => { }, (e) => { console.error("Error", e); }); browser.close(); 之类的超时模块:

p-timeout

您需要编写const pTimeout = require("p-timeout"); const shorterTimeout = 10000; const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setRequestInterception(true); page.on('request', async (request) => { if (!shouldImplementTimeout(request.url())) { await request.continue(); } await pTimeout(request.continue(), shorterTimeout) .catch((e) => { console.error(request.url(), "failed:", e); await request.abort("timedout"); }); }) await page.goto(process.argv[2]).then( () => { }, (e) => { console.error("Error", e); }); browser.close(); ,如果请求需要更短的超时,则应返回shouldImplementTimeout

答案 1 :(得分:1)

如果您只想根据请求的URL取消请求,则在伪造者中有一种模式:page.setRequestInterception。来自您的用例的文档样本:

const puppeteer = require('puppeteer');

puppeteer.launch().then(async browser => {
  const page = await browser.newPage();

  // turn on requests intercepting and cancellation capability
  await page.setRequestInterception(true);

  page.on('request', interceptedRequest => {

    console.log(interceptedRequest.url());

    if (interceptedRequest.url().includes("google-analytics.com"))
    {
      console.log("cancelled!");
      interceptedRequest.abort();
    }
    else
    {
      interceptedRequest.continue();
    }
  });
  await page.goto('http://ipv6ready.wanwuyunlian.com:8080/');
  await browser.close();
});