木偶:以无头模式保存PDF文件

时间:2018-08-08 01:08:27

标签: node.js puppeteer google-chrome-headless

我想要用无头铬和木偶戏实现的目标:

  1. 登录某些网站
  2. 导航到pdf文件
  3. 将其下载到服务器

根据此错误,无头铬无法导航到pdf文件: https://bugs.chromium.org/p/chromium/issues/detail?id=761295

因此,我尝试从当前的伪操纵者会话中获取cookie,并通过https.get请求将其传递,但​​不幸的是没有成功。

我的代码:

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://login-page', { waitUntil: 'networkidle0' });
await page.type('#email', 'email');
await page.type('#password', 'password');
await page.click('input[type="submit"]');
await page.waitForNavigation({ waitUntil: 'networkidle0' });

// following line throws an error with headless mode
// await page.goto('https://url-with-pdf-accessible-only-after-login');

// I'm trying to convert cookie object to cookie string to pass it with headers
const cookies = await page.cookies();
let cookieString = '';
for (index in cookies) {
  const cookie = cookies[index];
  for (key in cookie) {
    cookieString += key + '=' + cookie[key] + '; ';
  }
}

// following code save me empty file (0 bytes)
const file = fs.createWriteStream('file.pdf');
https.get({
  hostname: 'host-with-pdf-file',
  path: '/path-to-pdf-accessible-only-after-login,
  headers: {
    'Cookie': cookieString,
  }
}, res => {
  res.pipe(file);
});

我做错什么了吗?

还有其他方法可以将url(需要身份验证)中的pdf文件保存到服务器吗?

2 个答案:

答案 0 :(得分:2)

我遇到了几乎相同的问题。

信息:我正在Windows 10 64位,节点v8.9.4,木偶1.12.2上运行它

更多重要信息:不适用于嵌入式“ local-chromium”(puppeteer安装的73.0.3679.0(64位)),但适用于已安装的Chrome! (72.0.3626.119),所以我为启动方法实现了自定义的“ executablePath”属性:),它可以正常工作!

我搜索了几个小时,所以我希望这个解决方案可以有用...

const puppeteer = require('puppeteer');
(async () => {
  // Custom browser, headless not present Eq to true
  const browser = await puppeteer.launch({executablePath: 'C:/\Program Files (x86)/\Google/\Chrome/\Application/\chrome.exe'});
  const page = await browser.newPage();
  //URL
  await page.goto('https://www.theUrl', {waitUntil: 'networkidle2'});
  await page.waitFor('input[name=NameOfTheLoginHtmlField]');
  await page.$eval('input[name=NameOfTheLoginHtmlField]', el => el.value = 'InputValueOfTheLoginHtmlField');
  await page.waitFor('input[name=NameOfThePasswordHtmlField]');
  await page.$eval('input[name=NameOfThePasswordHtmlField]', el => el.value = 'InputValueOfTheLoginHtmlField');
  //The submit button has been replaced by an "a" with js function behind, so ...
  await page.click('#login-submit > a');
    //Allow to define the download path ('' = current directory : C:\Program Files (x86)\Google\Chrome\Application\72.0.3626.119)
    function setDownloadBehavior(downloadPath=''){
        return page._client.send('Page.setDownloadBehavior', {
            behavior: 'allow',
            downloadPath
        });
    }
  await setDownloadBehavior();
  await page.waitFor(5000);
  await browser.close();
})()

答案 1 :(得分:-1)

您可以使用express.js响应pdf文件吗?

  

res.sendFile(path.join(__ dirname,'e​​xample.pdf'));

example.pdf 是从您的服务器生成的文件