PDFJS:无效的PDF结构

时间:2019-11-14 11:17:44

标签: javascript pdf pdf.js

我正在尝试使用pdf.js从pdf文档中提取纯文本,由于某种原因,我无法克服Invalid PDF structure错误。

我的代码如下:

const pdfjslib = require('pdfjs-dist');

const pdfPath = 'https://www.corenet.gov.sg/media/2268607/dc19-07.pdf'

var loadingTask = pdfjslib.getDocument(pdfPath);
loadingTask.promise.then(async (doc) => {
    console.log(doc);
    return null
})
.catch((err)=>{
    console.log(err)
});

我尝试了来自同一域的其他pdf文档,但都引发了相同的错误:

...
Warning: Ignoring invalid character "34" in hex string
Warning: Ignoring invalid character "104" in hex string
Warning: Indexing all PDF objects
{ Error
    at InvalidPDFExceptionClosure (.../pdf_test/node_modules/pdfjs-dist/build/pdf.js:658:35)
    at Object.<anonymous> (...pdf_test/node_modules/pdfjs-dist/build/pdf.js:661:2)
    at __w_pdfjs_require__ (.../pdf_test/node_modules/pdfjs-dist/build/pdf.js:52:30)
    at Object.defineProperty.value (...pdf_test/node_modules/pdfjs-dist/build/pdf.js:129:23)
    at __w_pdfjs_require__ (.../pdf_test/node_modules/pdfjs-dist/build/pdf.js:52:30)
    at pdfjsVersion (...pdf_test/node_modules/pdfjs-dist/build/pdf.js:116:18)
    at .../pdf_test/node_modules/pdfjs-dist/build/pdf.js:119:10
    at webpackUniversalModuleDefinition (.../pdf_test/node_modules/pdfjs-dist/build/pdf.js:25:20)
    at Object.<anonymous> (.../pdf_test/node_modules/pdfjs-dist/build/pdf.js:32:3)
    at Module._compile (internal/modules/cjs/loader.js:776:30)
  name: 'InvalidPDFException',
  message: 'Invalid PDF structure' }

来自其他域的其他pdf似乎有效。请注意,从上述域下载pdf效果很好,并且可以在Chrome浏览器中查看。我怀疑pdf文档已损坏。我没有实现任何前端代码,因为以上代码的意图是将其托管在云上。

0 个答案:

没有答案