我正在尝试通过请求下载链接来下载文件然后将其中的文本转换为txt文件来获取pdf文件。但是,我收到了这个错误:
“(在读取XRef时):错误:无效的XRef流 XRefParseException“
将pdf加载到解析器中。这将引发错误处理程序,它只打印错误消息。这是我现在的代码:
import request from 'superagent';
import PDFparser from 'pdf2json';
//a download link (indicated by the dl=1) for some dropbox example.pdf
link = 'https://www.dropbox.com/s/22nvxasry8zpwbg/example%20(3).pdf?dl=1';
//sending a request to this download link
request.get(link).end((err, res) => {
if (res.headers['content-type'] === 'application/pdf') {
//creates a new file and pipes the response into the stream
let pdfId = 'search-' + Date.now();
let file = fs.createWriteStream('./tmp/pdf/' + pdfId + '.pdf');
res.pipe(file);
//api for pdfParser setting handlers
pdfParser.on("pdfParser_dataError", errData => {
console.error(errData.parserError)
});
pdfParser.on("pdfParser_dataReady", pdfData => {
console.log('got data, writing to txt file');
console.log("./tmp/txt/" + pdfId + ".txt");
fs.writeFile("./tmp/txt/" + pdfId + ".txt", pdfParser.getRawTextContent());
});
//load the pdf file into the pdfParser
// I think the error happens here
pdfParser.loadPDF('./tmp/pdf/' + pdfId + '.pdf');
}
});
我认为当我尝试将pdf加载到解析器中时会发生错误,但我并不是100%肯定。我不知道如何处理这个错误。任何帮助表示赞赏。谢谢!
以下是superagent的api指南: https://visionmedia.github.io/superagent/
和pdf2json的api指南:https://github.com/modesty/pdf2json