Getting the pdf blob from url and insert to drive directly using puppeteer library and fetch

时间:2019-01-15 18:05:40

标签: javascript node.js google-drive-api puppeteer

I´m trying to use puppeteer to log in a website and "download" a pdf directly to my drive. I've managed to reach the pdf page with puppeteer and I tried (between other tries) to get the blob using fetch with the cookies to send to drive. I can´t post the login information here, but if you could help me looking for an error (or more) in the code it would be great! For now, it goes to the page before pdf, gets the link, fetch with cookies and insert a pdf in drive, but the pdf is corrupted with 0 kb.

I tried setRequestInterception, getPdf (from puppeteer) and using buffer with some stuff I found on my research.

 //Page before pdfPage. Here I got the link: urlPdf
 //await page.goto(urlPdf); 
 //await page.waitForNavigation();
 //const htmlPdf = await page.content();

 const cookies = await page.cookies()
 const opts = {
    headers: {
        cookie: cookies
    }
};

 let blob = await fetch(urlPdf,opts).then(r => r.blob());
 console.log("pegou o blob")
 // upload file in specific folder

 var file ;
  console.log("driveApi upload reached")
  function blobToFile(req){
    file = req.body.blob
    //A Blob() is almost a File() - it's just missing the two properties below which we will add
    file.lastModifiedDate = new Date();
    file.name = teste.pdf;//req.body.word;
    return file;
  }


var folderId = myFolderId;
var fileMetadata = {
  'name': 'teste.pdf',
  parents: [folderId]
};
var media = {
  mimeType: 'application/pdf',
  body: file
};
drive.files.create({
  auth: jwToken,
  resource: fileMetadata,
  media: media,
  fields: 'id'
}, function(err, file) {
  if (err) {
    // Handle error
    console.error(err);
  } else {
    console.log('File Id: ', file.data.id);
  }
});

1 个答案:

答案 0 :(得分:0)

我尝试了很多事情,但是最终附带的解决方案发布在这里:

Puppeteer - How can I get the current page (application/pdf) as a buffer or file?

await page.setRequestInterception(true);

page.on('request', async request => {
    if (request.url().indexOf('exibirFat.do')>0) { //This condition is true only in pdf page (in my case of course)
      const options = {
        encoding: null,
        method: request._method,
        uri: request._url,
        body: request._postData,
        headers: request._headers
      }
      /* add the cookies */
      const cookies = await page.cookies();
      options.headers.Cookie = cookies.map(ck => ck.name + '=' + ck.value).join(';');
      /* resend the request */
      const response = await request_client(options);
      //console.log(response); // PDF Buffer
      buffer = response;
      let filename = 'file.pdf';
      fs.writeFileSync(filename, buffer); //Save file
   } else {
      request.continue();
   }
});

此解决方案需要:const request_client = require('request-promise-native');