如何下载在puppeteer的新标签页中打开的PDF?

时间:2018-06-11 19:39:29

标签: javascript node.js web-scraping puppeteer

我有一个带按钮的页面。单击该按钮时,它会在新选项卡中打开PDF。

如何使用puppeteer将PDF下载为文件?

也许我可以使用新选项卡中的缓冲区编写文件。但我不确定如何。

2 个答案:

答案 0 :(得分:2)

一个简单的解决方案是使用fetch api执行GET请求。这样,您可以读取响应,将其传递到后端并将其保存到磁盘。

使用以下示例代码作为参考:

import fs from 'fs';

async function downloadImage(page: any, url: string, fullpath: string) {
  const data = await page.evaluate(
    // tslint:disable-next-line no-shadowed-variable
    async ({ url }) => {
      function readAsBinaryStringAsync(blob) {
        return new Promise((resolve, reject) => {
          const fr = new FileReader();
          fr.readAsBinaryString(blob);
          fr.onload = () => {
            resolve(fr.result);
          };
        });
      }

      const r = await fetch(url, {
        credentials: 'include',
        headers: {
          accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp, */*;q=0.8',
          'cache-control': 'no-cache',
          pragma: 'no-cache',
          'sec-fetch-mode': 'navigate',
          'sec-fetch-site': 'same-site',
          'upgrade-insecure-requests': '1'
        },
        referrerPolicy: 'no-referrer-when-downgrade',
        body: null,
        method: 'GET',
        mode: 'cors'
      });

      return await readAsBinaryStringAsync(await r.blob());
    },
    { url }
  );

  fs.writeFileSync(fullpath, data, { encoding: 'binary' });
}

答案 1 :(得分:0)

使用puppeteer-extra节点模块。

Puppeteer-extra

const puppeteer = require('puppeteer-extra');
...
...
puppeteer.use(require('puppeteer-extra-plugin-user-preferences')({userPrefs: {
   download: {
     prompt_for_download: false,
     open_pdf_in_system_reader: true
  },
  plugins: {
    always_open_pdf_externally: true // this should do the trick
  }
}}))

const browser = await puppeteer.launch();

browser.on('targetcreated', async (target) => {
   console.log('targetcreated');
   if (target.type() !== 'page') {
     return;
   }
   try {
     const pageList = await browser.pages();
     pageList.forEach((page) => {
       page._client.send('Page.setDownloadBehavior', {
         behavior: 'allow',
         downloadPath: './pdfDownloaded/',
       });
     });
   } catch (e) {
     console.log("targetcreated", e);
   }
});
...
...

但是当我设置always_open_pdf_externally: true chrome崩溃时。

尝试它是否适合您,如果您找到任何

,请回复答案