Question

我想使用Microsoft认知服务计算机视觉API来识别比赛中跑步者的照片上的号码布号码，无论是单人还是相当少的个人跑步者。

这是OCR功能应该能够处理的任务吗？我已经尝试过几个样本，并且开始使用＃34;程序和测试控制台，它返回一个空的区域数组。我做错了什么，还是超出了它的能力？

Answer 1

首先，检查您的图片是否符合API的描述。

支持的图像格式：JPEG，PNG，GIF，BMP。图像文件大小必须是不到4MB。图像尺寸必须介于40 x 40和3200 x之间 3200像素，图像不能大于1000万像素。

OCR系统通常会做出一些假设;

图像旋转的程度不是一定程度，在微软的情况下是40度。

文本检测仍然是研究的热门话题。在野外检测文本可能具有挑战性。例如，Maria的评论中的图像非常简单。文字颜色为黑白色，照片取自

在这里，我分享了两张照片：

OCR不好的一个： http://www.athletico.com/blog2/wp-content/uploads/2012/04/Runners.jpg

以下是Microsoft Cognitive Services Vision OCR API

中此图像的输出

{
  "language": "zh-Hant",
  "textAngle": 6.0999999999999641,
  "orientation": "Up",
  "regions": [
    {
      "boundingBox": "1441,490,51,41",
      "lines": [
        {
          "boundingBox": "1441,490,51,41",
          "words": [
            {
              "boundingBox": "1441,490,51,41",
              "text": "39"
            }
          ]
        }
      ]
    }
  ]
}

OCR的优秀之处：

http://running.competitor.com/files/2014/04/HappyRunner-Raleigh14.jpg

现在让我们看看同一API的输出：

{
  "language": "en",
  "textAngle": -2.900000000000035,
  "orientation": "Up",
  "regions": [
    {
      "boundingBox": "1597,1824,585,576",
      "lines": [
        {
          "boundingBox": "1654,1824,528,67",
          "words": [
            {
              "boundingBox": "1654,1829,211,62",
              "text": "7?.cek"
            },
            {
              "boundingBox": "2146,1824,36,52",
              "text": "Y'"
            }
          ]
        },
        {
          "boundingBox": "1603,1889,551,98",
          "words": [
            {
              "boundingBox": "1603,1889,551,98",
              "text": "RALEIGH"
            }
          ]
        },
        {
          "boundingBox": "1695,1990,370,37",
          "words": [
            {
              "boundingBox": "1695,1990,79,35",
              "text": "1/2"
            },
            {
              "boundingBox": "1794,1993,271,34",
              "text": "marathon"
            }
          ]
        },
        {
          "boundingBox": "1742,2052,138,26",
          "words": [
            {
              "boundingBox": "1742,2052,105,23",
              "text": "presented"
            },
            {
              "boundingBox": "1856,2053,24,25",
              "text": "by"
            }
          ]
        },
        {
          "boundingBox": "1798,2099,156,21",
          "words": [
            {
              "boundingBox": "1798,2099,65,17",
              "text": "APRIL"
            },
            {
              "boundingBox": "1872,2101,26,19",
              "text": "13,"
            },
            {
              "boundingBox": "1905,2101,49,15",
              "text": "2014"
            }
          ]
        },
        {
          "boundingBox": "1597,2160,536,159",
          "words": [
            {
              "boundingBox": "1597,2160,536,159",
              "text": "19401"
            }
          ]
        },
        {
          "boundingBox": "1749,2368,101,32",
          "words": [
            {
              "boundingBox": "1749,2368,101,32",
              "text": "benefiting"
            }
          ]
        }
      ]
    }
  ]
}

好多了！有人可能会认为第二张图像难以识别。但是差异，几何图像变换（https://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/geometry/geo-tran.html），尤其是仿射变换对于计算机来说仍然很难掌握。我们的大脑处理成功率非常高。

因此，OCR很擅长识别面向相机的图像，而在使用这种变换的文本图像中很容易失败。

如何使用Computer Vision API识别跑步者＆＃39;围兜数字

1 个答案: