Question

我正在尝试通过在Python中使用Tesseract OCR准备半导体晶片ID，但是它也不是很成功，-c tessedit_char_whitelist=0123456789XL config也不起作用。读取芯片ID为po4>1。

部分代码如下：

# identify
optCode = pytesseract.image_to_string("c:/opencv/ID_fine_out22.jpg",lang="eng", config=' --psm 6 -c tessedit_char_whitelist=0123456789XL')
# print chip iD 
print("ChipID：", optCode)

是否有任何改进OCR的想法？也请尝试仅读取数字。

我也认为ML是一种方法，因为我有大量的样本图像。

Answer 1

tessedit_char_whitelist = 0123456789XL

Tesseract 4中尚不支持白名单，请参阅https://github.com/tesseract-ocr/tesseract/issues/751

顺便说一句，使用free ocr api，我得到“ 13431”。也许您可以使用此api。

Tesseract OCR用于半导体晶圆ID检测（Python）

1 个答案: