如何在Puppeteer中获取所有请求标头

时间:2020-03-19 15:32:58

标签: node.js puppeteer

我正在尝试获取所有请求标头以正确检查请求,但是它只返回诸如User-Agent和Origin之类的标头,而原始请求包含更多标头。

有没有一种方法可以真正获取所有标头?

供参考,下面是代码:

const puppeteer = require('puppeteer');

const browser = await puppeteer.launch({
    headless: false
});

const page = await browser.newPage();
page.on('request', req => {
   console.log(req.headers());
});
await page.goto('https://reddit.com');

预先感谢,iLinked

2 个答案:

答案 0 :(得分:0)

您可以使用 url https://headers.cloxy.net/request.php 查看您的标题

await page.goto('https://headers.cloxy.net/request.php');

U 也可以打印到日志

  console.log((await page.goto('https://example.org/')).request().headers());

答案 1 :(得分:0)

您可以从 puppeteer 切换到 playwright,然后使用 Firefox(但不是 Chromium 或 WebKit)您将获得更多标题:

import playwright from 'playwright';

(async () => {
    const browser = await playwright['firefox'].launch();
    const page = await browser.newPage();

    page.on('request', req => {
        console.log(req.headers());
    });
    await page.goto("https://example.com/");

    await browser.close();
})();

playwright['firefox'] 输出(在其他网站上我也看到过 cookie):

{
  host: 'example.com',
  'user-agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0',
  accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
  'accept-language': 'en-US,en;q=0.5',
  'accept-encoding': 'gzip, deflate, br',
  connection: 'keep-alive',
  'upgrade-insecure-requests': '1'
}

对比playwright['chromium'] 输出:

{
  'upgrade-insecure-requests': '1',
  'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/90.0.4421.0 Safari/537.36'
}