如何使用下载PDF并将其转换为node.js中的txt文件? (JavaScript的)

时间:2016-06-10 18:54:50

标签: javascript node.js pdf text pdftotext

我正在尝试通过请求下载链接来下载文件然后将其中的文本转换为txt文件来获取pdf文件。但是,我收到了这个错误:

“(在读取XRef时):错误:无效的XRef流 XRefParseException“

将pdf加载到解析器中。这将引发错误处理程序,它只打印错误消息。这是我现在的代码:

import request from 'superagent';
import PDFparser from 'pdf2json';

//a download link (indicated by the dl=1) for some dropbox example.pdf
link = 'https://www.dropbox.com/s/22nvxasry8zpwbg/example%20(3).pdf?dl=1';  

//sending a request to this download link
request.get(link).end((err, res) => {
    if (res.headers['content-type'] === 'application/pdf') {
      //creates a new file and pipes the response into the stream
      let pdfId = 'search-' + Date.now();
      let file = fs.createWriteStream('./tmp/pdf/' + pdfId + '.pdf');
      res.pipe(file);

      //api for pdfParser setting handlers
      pdfParser.on("pdfParser_dataError", errData => {
        console.error(errData.parserError) 
      });
      pdfParser.on("pdfParser_dataReady", pdfData => {
        console.log('got data, writing to txt file');
        console.log("./tmp/txt/" + pdfId + ".txt");
        fs.writeFile("./tmp/txt/" + pdfId + ".txt", pdfParser.getRawTextContent());
      });

      //load the pdf file into the pdfParser
      // I think the error happens here
      pdfParser.loadPDF('./tmp/pdf/' + pdfId + '.pdf');

    }
});

我认为当我尝试将pdf加载到解析器中时会发生错误,但我并不是100%肯定。我不知道如何处理这个错误。任何帮助表示赞赏。谢谢!

以下是superagent的api指南: https://visionmedia.github.io/superagent/

和pdf2json的api指南:https://github.com/modesty/pdf2json

0 个答案:

没有答案