Question

我对OCR，OpenCV，Tesseract等世界还很陌生，希望为我正在从事的项目寻求正确的指导或建议。作为背景，我在由Full Swing Golf驱动的室内模拟器上练习高尔夫。我的目标是构建一个应用程序（最好是iPhone，但也可以使用台式机），该应用程序将能够捕获模拟器提供的数据并对其进行处理（我希望如此）。整个工作流程如下所示：

设置iPhone或笔记本电脑摄像头以观看模拟器屏幕。
击球
显示的“统计”屏幕或多或少类似于：

检测到已显示“统计信息”屏幕并获取所有相关数据：

| Distance | Launch | Back Spin | Club Speed | Carry | To Pin | Direction | Ball Speed | Side Spin | Club Face | Club Path |
|----------|--------|-----------|------------|-------|--------|-----------|------------|-----------|-----------|-----------|
| 345      | 13     | 3350      | 135        | 335   | 80     | 2.4       | 190        | 350       | 4.3       | 1.6       |

5- ?：将数据保存到我的应用中，随着时间的推移跟踪它，等等...

到目前为止的尝试：

似乎OpenCV的matchTemplate是查找图像中所有标题（距离，启动等...）的简单方法，并且当图像和模板都完美时，它确实可以工作解析度。但是，由于这将是一个iPhone应用程序，因此我无法真正保证质量（在合理范围内）。此外，屏幕几乎永远不会像上面显示的那样一直打开。相机很可能会偏向侧面，因此我们必须相应地偏斜。我试图使用下面的图片在倾斜校正逻辑上无济于事：

由于上述与匹配模板有关的问题，事实证明，找到要通过getPerspectiveTransform和warpPerspective进行去歪斜的参考点非常困难。

我还尝试过使用类似于以下代码的代码来动态调整比例：

def findTemplateLocation(image_path):
    template = cv2.imread(image_path)
    template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)

    w, h = template.shape[::-1]
    threshold = 0.65
    loc = []

    for scale in np.linspace(0.1, 2, 20)[::-1]:
        resized = imutils.resize(template, width=int(template.shape[1] * scale))
        w, h = resized.shape[::-1]
        res = cv2.matchTemplate(image_gray, resized, cv2.TM_CCOEFF_NORMED)

        loc = np.where(res >= threshold)
        if len(list(zip(*loc[::-1]))) > 0:
            break

    if loc and len(list(zip(*loc[::-1]))) > 0:
        adjusted_w = int(w/scale)
        adjusted_h = int(h/scale)
        print(str(adjusted_w) + " " + str(adjusted_h) + " " + str(scale))

        ret = []
        for pt in zip(*loc[::-1]):
            ret.append({'width': w, 'height': h, 'location': pt})

        return ret

    return None

这仍然返回大量误报。

我希望就如何解决这个问题获得一些建议。我接受任何语言/工作流程。

如果看来我走在正确的道路上，那么我当前的代码在https://gist.github.com/naderhen/9ec8d45f13d92507131d5bce0e84fad8处。非常感谢您提供最佳下一步建议。

感谢您提供的任何帮助！

编辑：其他资源

这个周末，我已经在室内模拟器上上传了很多时间的视频和静态照片：https://www.dropbox.com/sh/5vub2mi4rvunyaw/AAAY1_7Q_WBV4JvmDD0dEiTDa?dl=0

我试图用不同的照明角度获得许多不同的角度。请让我知道我是否可以提供其他可能有用的资源。

Answer 1

因此，我尝试了两种不同的方法：

轮廓检测-这似乎是最明显的方法，因为统计数据屏幕是图像的主要部分，并且出现在所有图像中。尽管它确实适用于三个图像中的两个，但使用参数可能并不十分可靠。这是我尝试绘制轮廓的步骤：

首先，获取灰度图像，或使用 HSV 中的 Value 通道之一。然后，使用Otsu or Adaptive Thresholding对图像进行阈值处理。在使用了许多相关的参数之后，我得到了令人满意的结果，这基本上意味着在黑色背景上的白色的整个统计屏幕都不错。然后，对轮廓进行如下排序：
```
contours = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)[1]
# Sort the contours to avoid unnecessary comparison in the for loop below
cntsSorted = sorted(contours, key=lambda x: cv2.contourArea(x), reverse=True)

for cnt in cntsSorted[0:20]:
    peri = cv2.arcLength(cnt, True)
    approx = cv2.approxPolyDP(cnt, 0.04 * peri, True)
    if len(approx) == 4 and peri > 10000:
        cv2.drawContours(sorted_image, cnt, -1, (0, 255, 0), 10)
```
特征检测和匹配：由于使用轮廓不够鲁棒，因此我尝试了另一种方法来解决与您类似的问题。这种方法相当健壮，速度更快（我在2年前在Android手机上尝试过，对于1280 x 760的图像，它可以在不到一秒钟的时间内完成工作）。但是，在尝试了您的工作案例之后，我发现您的图像非常模糊。我的意思是，您的问题中有两个图像具有非常相似的原语，并且可以正常工作，但是您在评论中发布的图像非常与这些图像不同因此找不到合适的匹配数（在我的情况下至少为10）。如果您可以发布一组您实际上会遇到的漂亮图像，那么我将使用新图像集上的结果来更新此答案。更重要的是，场景的图像显然在角度上发生了变化，假设您能够获得非常好的原始图像（问题中的第一个），这应该不是问题。但是，照明条件的变化可能会很痛苦。我建议您使用不同的颜色空间，例如HSV, Lab and Luv instead of BGR. Here是您可以找到如何实现自己的功能匹配器的有效示例。取决于您使用的OpenCV版本，需要进行一些代码更改，但是我确定您可以找到解决方案（我做到了；））。

一个很好的例子：

一些建议：

请尝试为要用来与其他图像匹配的图像（在本例中为您的第一张图像）获取尽可能清晰的图像。希望这将需要您减少处理。
在找到关键点之前尝试使用unsharp mask。
我的结果来自使用ORB。您还可以尝试使用其他检测器/描述符，例如SURF，SIFT和FAST。

最后，您的模板匹配方法应该只在比例缩放而不是透视图发生更改的情况下起作用。

希望这会有所帮助！如果您有任何其他问题和/或准备好良好的图像设置（擦手掌），请发表评论。干杯!

编辑1：这是我在Opencv 3.4.3和Python 3.4中用于功能检测和匹配的代码

def unsharp_mask(im):
    # This is used to sharpen images
    gaussian_3 = cv2.GaussianBlur(im, (3, 3), 3.0)
    return cv2.addWeighted(im, 2.0, gaussian_3, -1.0, 0, im)

def screen_finder2(image, source, num=0):
    def resize(im, new_width):
        r = float(new_width) / im.shape[1]
        dim = (new_width, int(im.shape[0] * r))
        return cv2.resize(im, dim, interpolation=cv2.INTER_AREA)
    width = 300
    source = resize(source, new_width=width)
    image = resize(image, new_width=width)

    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2LUV)
    image, u, v = cv2.split(hsv)

    hsv = cv2.cvtColor(source, cv2.COLOR_BGR2LUV)
    source, u, v = cv2.split(hsv)

    MIN_MATCH_COUNT = 10
    orb = cv2.ORB_create()
    kp1, des1 = orb.detectAndCompute(image, None)
    kp2, des2 = orb.detectAndCompute(source, None)

    flann = cv2.DescriptorMatcher_create(cv2.DescriptorMatcher_FLANNBASED)
    # Without the below 2 lines, matching doesn't work
    des1 = np.asarray(des1, dtype=np.float32)
    des2 = np.asarray(des2, dtype=np.float32)

    matches = flann.knnMatch(des1, des2, k=2)

    # store all the good matches as per Lowe's ratio test
    good = []
    for m, n in matches:
        if m.distance < 0.7 * n.distance:
            good.append(m)

    if len(good) >= MIN_MATCH_COUNT:
        src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1, 
                                                                         1, 2)
        dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1, 
                                                                         1, 2)

        M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
        matchesMask = mask.ravel().tolist()

        h,w = image.shape
        pts = np.float32([[0, 0], [0, h-1], [w-1, h-1], [w-1, 0]]).reshape(-1, 
                                                                         1, 2)
        dst = cv2.perspectiveTransform(pts, M)
        source_bgr = cv2.cvtColor(source, cv2.COLOR_GRAY2BGR)
        img2 = cv2.polylines(source_bgr, [np.int32(dst)], True, (0,0,255), 3, 
                             cv2.LINE_AA)
        cv2.imwrite("out"+str(num)+".jpg", img2)
    else:
        print("Not enough matches." + str(len(good)))
        matchesMask = None

    draw_params = dict(matchColor=(0, 255, 0), # draw matches in green color
                       singlePointColor=None,
                       matchesMask=matchesMask, # draw only inliers
                       flags=2)
    img3 = cv2.drawMatches(image, kp1, source, kp2, good, None, **draw_params)
    cv2.imwrite("ORB"+str(num)+".jpg", img3)


match_image = unsharp_mask(cv2.imread("source.jpg"))
image_1 = unsharp_mask(cv2.imread("Screen_1.jpg"))
screen_finder2(match_image, image_1, num=1)

从复杂的UI中提取半结构化文本（Golf Simulator）

1 个答案: