有没有办法让木偶的等待者" networkidle"只考虑XHR(ajax)请求?

时间:2018-03-28 15:27:08

标签: node.js puppeteer

我正在使用puppeteer来评估我的测试应用中基于javascript的网页HTML。

这是我用来确保加载所有数据的行:

await page.setRequestInterception(true);
page.on("request", (request) => {
  if (request.resourceType() === "image" || request.resourceType() === "font" || request.resourceType() === "media") {
    console.log("Request intercepted! ", request.url(), request.resourceType());
    request.abort();
  } else {
    request.continue();
  }
});
try {
  await page.goto(url, { waitUntil: ['networkidle0', 'load'], timeout: requestCounterMaxWaitMs });
} catch (e) {

}

这是等待 ajax请求完成的最佳方法吗?

感觉不对,但我不确定是否应该使用networkidle0,networkidle1等?

3 个答案:

答案 0 :(得分:2)

XHR本质上可以在应用程序的后期出现。如果应用程序在例如1秒之后发送XHR并且您想等待它,则任何networkidle0都无法帮助您。我想如果你想“正确地”做到这一点,你应该知道你在等待什么,await

以下是应用程序中稍后发生XHR的示例,它等待所有这些:

const puppeteer = require('puppeteer');

const html = `
<html>
  <body>
    <script>
      setTimeout(() => {
        fetch('https://swapi.co/api/people/1/');
      }, 1000);

      setTimeout(() => {
        fetch('https://www.metaweather.com/api/location/search/?query=san');
      }, 2000);

      setTimeout(() => {
        fetch('https://api.fda.gov/drug/event.json?limit=1');
      }, 3000);
    </script>
  </body>
</html>`;

// you can listen to part of the request
// in this example I'm waiting for all of them
const requests = [
    'https://swapi.co/api/people/1/',
    'https://www.metaweather.com/api/location/search/?query=san',
    'https://api.fda.gov/drug/event.json?limit=1'
];

const waitForRequests = (page, names) => {
  const requestsList = [...names];
  return new Promise(resolve =>
     page.on('request', request => {
       if (request.resourceType() === "xhr") {
         // check if request is in observed list
         const index = requestsList.indexOf(request.url());
         if (index > -1) {
           requestsList.splice(index, 1);
         }

         // if all request are fulfilled
         if (!requestsList.length) {
           resolve();
         }
       }
       request.continue();
     })
  );
};


(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setRequestInterception(true);

  // register page.on('request') observables
  const observedRequests = waitForRequests(page, requests);

  // await is ignored here because you want to only consider XHR (ajax) 
  // but it's not necessary
  page.goto(`data:text/html,${html}`);

  console.log('before xhr');
  // await for all observed requests
  await observedRequests;
  console.log('after all xhr');
  await browser.close();
})();

答案 1 :(得分:2)

您可以使用pending-xhr-puppeteer,这是一个公开承诺的库,等待所有待处理的xhr请求得到解决。

像这样使用它:

const puppeteer = require('puppeteer');
const { PendingXHR } = require('pending-xhr-puppeteer');

const browser = await puppeteer.launch({
  headless: true,
  args,
});

const page = await browser.newPage();
const pendingXHR = new PendingXHR(page);
await page.goto(`http://page-with-xhr`);
// Here all xhr requests are not finished
await pendingXHR.waitForAllXhrFinished();
// Here all xhr requests are finished

免责声明:我是pending-xhr-puppeteer的维护者

答案 2 :(得分:1)

我同意this answer中的观点,即等待 all 网络活动停止(“所有数据均已加载”)是一个相当模糊的概念,它完全取决于行为您要抓取的网站。

用于检测响应的选项包括等待固定的持续时间,网络流量空闲后的固定持续时间,特定的响应(或一组响应),元素出现在页面上,谓词返回true等。全部Puppeteer supports

考虑到这一点,最典型的情况是,您正在等待来自已知(或使用模式或前缀的部分已知)资源URL的某些特定响应或一组响应,这些URL将提供有效负载您想要读取和/或触发您需要检测的DOM交互。 Puppeteer为此提供了page.waitForResponse

下面是一个基于existing post的示例(并展示了如何在响应时从响应中检索数据):

{{1}}