Question

我正在使用pytesseract来解析从浏览器屏幕截图中提取的数字。这是我第一次使用OCR，如果我采用这种错误的方法，请纠正我。对于我来说很容易理解的图像，我得到的准确性很低。有时我得到空字符串；很少我也得到错误的数字。

在其他地方，人们建议对图像进行过滤和放大。我这样做了，它做得更好，从几乎0的准确性提高到50％左右，但这仍然很差。我正在研究硒提取的屏幕截图；下面报告了一些代码。抱歉，如果它太乱了，我包括了图像加载和处理部分，以显示我在做什么，但不想放弃正在加载的页面。

这是一张图片，其中显示了经过处理后的图片，以及解析和转换为float的结果。

from selenium import webdriver
from PIL import Image
import pytesseract, cv2, time, numpy as np

# load the page, enlarge, save as png, load as usable image
driver.get("https://a-page-I-wont-tell-you-sorry")
time.sleep(5) # wait for loading
driver.execute_script('document.body.style.MozTransform = "scale(1.50)";') # enlarge
enlarged_screenshot = driver.get_screenshot_as_png()
file = open("enlarged_screenshot.png", "wb")
file.write(enlarged_screenshot)
file.close()
enlarged_screenshot = Image.open("enlarged_screenshot.png")

# method for cropping and filtering
def crop_and_filter(image, coordinates, filter_level):
    width, height = image.size
    x0, y0, x1, y1 = coordinates
    cropped_image = image.crop((width*x0, height*y0, width*x1, height*y1))
    image_l = cropped_image.convert("L")
    image_array = np.array(image_l)
    _, filtered_image_array = cv2.threshold(image_array, filter_level, 255, cv2.THRESH_BINARY)    

    print("*"*100); print("Filtered image:")
    display(Image.fromarray(filtered_image_array))

    return filtered_image_array

# example of how I call and parse it
x0 = 0.51; y0 = 0.43; delta_x = 0.05; delta_y = 0.025
filtered_image_array = crop_and_filter(enlarged_screenshot, (x0, y0, x0+delta_x, y0+delta_y), 125, True)
number = pytesseract.image_to_string(filtered_image_array, config="-c tessedit_char_whitelist=0123456789.\t%")

Answer 1

这开始是评论，但时间太长了：

您的问题有点不清楚，但最终我想您想对您在https://i.stack.imgur.com/m5WJQ.png上发布的实际图像运行Tesseract

我使用的命令是

tesseract --oem 1 -l eng --psm 11 m5WJQ.png stdout

这产生了以下输出：

ek ok ek ok ok ok ok ok ok ok ok ok

Filtered image:

65

HAA

Filtered image:

3

HAA

Filtered image:

3.5

HAA

Filtered image:

2.64

HAA

Filtered image:

75

HAA

Filtered image:

3.1

HAA

Filtered image:

3.6

HAA

Filtered image:

2.68

EARSED NUMBERS:

[nan, nan,

3.5, 2.64, nan,

3.1, 3.6, 2.68]

根据您对原始问题的评论，这对您来说很好。

我正在从源代码构建的macOS 10.13.6 High Sierra上运行Tesseract（但您不必这样做）。

tesseract --version
tesseract 5.0.0-alpha-371-ga9227
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6

查看是否还可以重现此内容，如果不能，请发表评论。我将看看是否可以从pytesseract获得相应的输出。

此外，由于您（有时）知道数字应该是多少，因此可以使用诸如ocreval（https://github.com/eddieantonio/ocreval-我不隶属于它）之类的工具来查看您的跑步与已知跑步相比做得如何/ input /“地面”真相。

HTH

使用tesseract在简单图像上进行OCR识别的准确性令人惊讶地低。我该如何改善？

1 个答案: