如何使用Computer Vision API识别跑步者'围兜数字

时间:2017-05-19 19:30:49

标签: computer-vision microsoft-cognitive

我想使用Microsoft认知服务计算机视觉API来识别比赛中跑步者的照片上的号码布号码,无论是单人还是相当少的个人跑步者。

这是OCR功能应该能够处理的任务吗?我已经尝试过几个样本,并且开始使用#34;程序和测试控制台,它返回一个空的区域数组。我做错了什么,还是超出了它的能力?

1 个答案:

答案 0 :(得分:0)

首先,检查您的图片是否符合API的描述。

  

支持的图像格式:JPEG,PNG,GIF,BMP。图像文件大小必须是   不到4MB。图像尺寸必须介于40 x 40和3200 x之间   3200像素,图像不能大于1000万像素。

OCR系统通常会做出一些假设;

图像旋转的程度不是一定程度,在微软的情况下是40度。

文本检测仍然是研究的热门话题。在野外检测文本可能具有挑战性。例如,Maria的评论中的图像非常简单。文字颜色为黑白色,照片取自

在这里,我分享了两张照片:

OCR不好的一个: http://www.athletico.com/blog2/wp-content/uploads/2012/04/Runners.jpg

以下是Microsoft Cognitive Services Vision OCR API

中此图像的输出
{
  "language": "zh-Hant",
  "textAngle": 6.0999999999999641,
  "orientation": "Up",
  "regions": [
    {
      "boundingBox": "1441,490,51,41",
      "lines": [
        {
          "boundingBox": "1441,490,51,41",
          "words": [
            {
              "boundingBox": "1441,490,51,41",
              "text": "39"
            }
          ]
        }
      ]
    }
  ]
}

OCR的优秀之处:

http://running.competitor.com/files/2014/04/HappyRunner-Raleigh14.jpg

现在让我们看看同一API的输出:

{
  "language": "en",
  "textAngle": -2.900000000000035,
  "orientation": "Up",
  "regions": [
    {
      "boundingBox": "1597,1824,585,576",
      "lines": [
        {
          "boundingBox": "1654,1824,528,67",
          "words": [
            {
              "boundingBox": "1654,1829,211,62",
              "text": "7?.cek"
            },
            {
              "boundingBox": "2146,1824,36,52",
              "text": "Y'"
            }
          ]
        },
        {
          "boundingBox": "1603,1889,551,98",
          "words": [
            {
              "boundingBox": "1603,1889,551,98",
              "text": "RALEIGH"
            }
          ]
        },
        {
          "boundingBox": "1695,1990,370,37",
          "words": [
            {
              "boundingBox": "1695,1990,79,35",
              "text": "1/2"
            },
            {
              "boundingBox": "1794,1993,271,34",
              "text": "marathon"
            }
          ]
        },
        {
          "boundingBox": "1742,2052,138,26",
          "words": [
            {
              "boundingBox": "1742,2052,105,23",
              "text": "presented"
            },
            {
              "boundingBox": "1856,2053,24,25",
              "text": "by"
            }
          ]
        },
        {
          "boundingBox": "1798,2099,156,21",
          "words": [
            {
              "boundingBox": "1798,2099,65,17",
              "text": "APRIL"
            },
            {
              "boundingBox": "1872,2101,26,19",
              "text": "13,"
            },
            {
              "boundingBox": "1905,2101,49,15",
              "text": "2014"
            }
          ]
        },
        {
          "boundingBox": "1597,2160,536,159",
          "words": [
            {
              "boundingBox": "1597,2160,536,159",
              "text": "19401"
            }
          ]
        },
        {
          "boundingBox": "1749,2368,101,32",
          "words": [
            {
              "boundingBox": "1749,2368,101,32",
              "text": "benefiting"
            }
          ]
        }
      ]
    }
  ]
}

好多了!  有人可能会认为第二张图像难以识别。但是差异,几何图像变换(https://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/geometry/geo-tran.html),尤其是仿射变换对于计算机来说仍然很难掌握。我们的大脑处理成功率非常高。

因此,OCR很擅长识别面向相机的图像,而在使用这种变换的文本图像中很容易失败。