Question

我的问题是是否有可能从axios响应流中的tesseract.js识别

const axios = require('axios');
const { TesseractWorker } = require('tesseract.js');
const worker = new TesseractWorker();

axios({
  method: 'get',
  url: 'https://lh3.googleusercontent.com/iXmJ9aWblkGDpg-_jpcqaY10KmA8HthjZ7F15U7mJ9PQK6vZEStMlathz1FfQQWV5XeeF-A1tZ0UpDjx3q6vEm2BWZn5k1btVSuBk9ad=s660',
  responseType: 'stream'
})
  .then(function (response) {
    //this doesn't work
    worker.recognize(response.data).then(result => {
      console.log(result);
    });
  });

我看到了一些示例https://ourcodeworld.com/articles/read/580/how-to-convert-images-to-text-with-pure-javascript-using-tesseract-js和https://ourcodeworld.com/articles/read/348/getting-started-with-optical-character-recognition-ocr-with-tesseract-in-node-js。

但是我不能从这个例子中弄清楚。

-------------------------------------------------- ------更新------------------------------------------- -------------------------

调试后，我发现tesseract.js没问题，因为它正在调用本机node.js fs readFile函数https://github.com/naptha/tesseract.js/blob/master/src/node/index.js#L37

因此，readFile面临着有关如何从axios响应读取file的问题。这也是不可能的。由于readFile仅接受路径而不接受数据。这样会给tesseract.js造成一个问题，以便在识别readFile时可以绕开它。

Answer 1

我从未使用过该库，但是从给出的示例和对它们的源代码的快速检查来看，worker.recognize似乎不接受流作为参数，而是期望图像URL或实际图像，并在内部处理“如果需要”的网络调用。

https://github.com/naptha/tesseract.js/blob/master/src/common/TesseractWorker.js#L74

const { TesseractWorker } = require( 'tesseract.js' );
const worker = new TesseractWorker();


worker.recognize('https://lh3.googleusercontent.com/iXmJ9aWblkGDpg-_jpcqaY10KmA8HthjZ7F15U7mJ9PQK6vZEStMlathz1FfQQWV5XeeF-A1tZ0UpDjx3q6vEm2BWZn5k1btVSuBk9ad=s660')
.then(console.log)
.catch(console.error)

Answer 2

在axios中，可以将responseType更改为arraybuffer，如果使用Node.js，则将blob更改为浏览器。并将结果传递到Tesseract.recognize

例如，

const img = await axios({
  method: 'get',
  url: 'your img url',
  responseType: 'arraybuffer' //for me it's node.js
});
const imgeDataAsString = await Tesseract.recognize(
  img.data,
  'eng',
  { logger: m => console.log(m) }
).then(({ data: { text } }) => text);

您引用axios文档here

tesseract.js从axios流响应中识别

2 个答案: