Question

我正在尝试通过缓冲区将图像数据从Jimp图像对象传递到Tesserract（ocr lib）：

image.getBufferAsync('image/png').then((buffer) => {
  // Buffer here is <Buffer 12 34 56 ...
  const worker = new TesseractWorker();
  worker.recognize(buffer)
      .then((result) => { console.log('result', result.text); });

});

从Teserract抛出错误，说它想要Uint8Array而不是缓冲

TypeError [ERR_INVALID_ARG_VALUE]: The argument 'path' must be a string or Uint8Array without null bytes. Received <Buffer 89 50 4e 47...

所以我尝试将缓冲区转换为Uint8Array：

buffer = new Uint8Array(buffer);

但是我得到另一个错误：

TypeError [ERR_INVALID_ARG_VALUE]: The argument 'path' must be a string or Uint8Array without null bytes. Received Uint8Array [
  137,
  80,
  ...

哪里有错误？

如果我将图像文件保存到光盘上，然后通过Teserract读取其路径-它可以工作，因此问题不应该出在图像上。

Answer 1

文档指出，在Node JS中，img参数应该是本地图像的路径。

在浏览器上，图像可以是：


img，视频或画布元素

文件对象（来自文件）

可访问图像的路径或URL


在Node.js中，图像可以是


本地图像的路径

https://github.com/naptha/tesseract.js/blob/master/docs/image-format.md

这意味着库希望自己读取文件，而不是被提供字节流进行分析。

Answer 2

以下内容对我有用。认为我们可能使用的是不同版本的tesseract（v.3.05），但不要认为它有太大变化。

    var Jimp = require('jimp');
    var Tesseract = require('tesseract.js');
    var file = 'YourFile.png';  // Or .jpg etc...

    Jimp.read(file, async (err, image) => {
        if (err) throw err;

        //Do your Jimp stuff here to 'image' then...

        const buffer = await image.getBufferAsync(Jimp.AUTO);

        //Above line creates a buffer using Jimp.AUTO (the  
        //original file format from your variable 'file')

        Tesseract.recognize(buffer,'eng')
            .then(data => {
             console.log(data)
            })
     })

将缓冲区作为Uint8Array传递而没有空字节

2 个答案: