我尝试使用Google Vision API文本检测功能和Google的网络演示来对我的图像进行OCR。两种结果不一样。
首先,我在网址https://cloud.google.com/vision/docs/drag-and-drop上通过演示进行了尝试。最后,我通过python语言在Google api代码中进行了尝试。两个结果不一样,我也不知道为什么。你能帮我解决这个问题吗?
我的api结果:"SAMSUNG Galaxy M20Siêu Pin vô doi, sac nhanh tuc thiMoiSAMSUNG4.990.000dTrà gop 0%Mua ngay"
我的网络演示结果:https://imge.to/i/q4gRw 非常感谢你
我的python代码在这里:
client = vision.ImageAnnotatorClient()
raw_byte = cv2.imencode('.jpg', image)[1].tostring()
post_image = types.Image(content=raw_byte)
image_context = vision.types.ImageContext()
response = client.text_detection(image=post_image, image_context=image_context)
答案 0 :(得分:1)
这是打字稿代码。
但是这个想法不是使用text_detection
,而是使用document_text_detection
之类的东西(不确定python API具体提供了什么)。
使用documentTextDetection()
代替textDetection()
为我解决了完全相同的问题。
const fs = require("fs");
const path = require("path");
const vision = require("@google-cloud/vision");
async function quickstart() {
let text = '';
const fileName = "j056vt-_800w_800h_sb.jpg";
const imageFile = fs.readFileSync(fileName);
const image = Buffer.from(imageFile).toString("base64");
const client = new vision.ImageAnnotatorClient();
const request = {
image: {
content: image
},
imageContext: {
languageHints: ["vi-VN"]
}
};
const [result] = await client.documentTextDetection(request);
// OUTPUT METHOD A
for (const tmp of result.textAnnotations) {
text += tmp.description + "\n";
}
console.log(text);
const out = path.basename(fileName, path.extname(fileName)) + ".txt";
fs.writeFileSync(out, text);
// OUTPUT METHOD B
const fullTextAnnotation = result.fullTextAnnotation;
console.log(`Full text: ${fullTextAnnotation.text}`);
fullTextAnnotation.pages.forEach(page => {
page.blocks.forEach(block => {
console.log(`Block confidence: ${block.confidence}`);
block.paragraphs.forEach(paragraph => {
console.log(`Paragraph confidence: ${paragraph.confidence}`);
paragraph.words.forEach(word => {
const wordText = word.symbols.map(s => s.text).join("");
console.log(`Word text: ${wordText}`);
console.log(`Word confidence: ${word.confidence}`);
word.symbols.forEach(symbol => {
console.log(`Symbol text: ${symbol.text}`);
console.log(`Symbol confidence: ${symbol.confidence}`);
});
});
});
});
});
}
quickstart();
答案 1 :(得分:0)
实际上,比较两个结果,我看到的唯一区别是结果的显示方式。 Google Cloud拖放网站会显示带有边界框的结果,并尝试查找文本区域。
您使用python脚本获得的响应包括相同的信息。一些例子:
texts = response.text_annotations
print([i.description for i in texts])
# prints all the words that were found in the image
print([i.bounding_poly.vertices for i in texts])
# prints all boxes around detected words
随时提出更多问题进行澄清。
其他一些想法: