我想使用Microsoft认知服务计算机视觉API来识别比赛中跑步者的照片上的号码布号码,无论是单人还是相当少的个人跑步者。
这是OCR功能应该能够处理的任务吗?我已经尝试过几个样本,并且开始使用#34;程序和测试控制台,它返回一个空的区域数组。我做错了什么,还是超出了它的能力?
答案 0 :(得分:0)
首先,检查您的图片是否符合API的描述。
支持的图像格式:JPEG,PNG,GIF,BMP。图像文件大小必须是 不到4MB。图像尺寸必须介于40 x 40和3200 x之间 3200像素,图像不能大于1000万像素。
OCR系统通常会做出一些假设;
图像旋转的程度不是一定程度,在微软的情况下是40度。
文本检测仍然是研究的热门话题。在野外检测文本可能具有挑战性。例如,Maria的评论中的图像非常简单。文字颜色为黑白色,照片取自
在这里,我分享了两张照片:
OCR不好的一个: http://www.athletico.com/blog2/wp-content/uploads/2012/04/Runners.jpg
以下是Microsoft Cognitive Services Vision OCR API
中此图像的输出{
"language": "zh-Hant",
"textAngle": 6.0999999999999641,
"orientation": "Up",
"regions": [
{
"boundingBox": "1441,490,51,41",
"lines": [
{
"boundingBox": "1441,490,51,41",
"words": [
{
"boundingBox": "1441,490,51,41",
"text": "39"
}
]
}
]
}
]
}
OCR的优秀之处:
http://running.competitor.com/files/2014/04/HappyRunner-Raleigh14.jpg
现在让我们看看同一API的输出:
{
"language": "en",
"textAngle": -2.900000000000035,
"orientation": "Up",
"regions": [
{
"boundingBox": "1597,1824,585,576",
"lines": [
{
"boundingBox": "1654,1824,528,67",
"words": [
{
"boundingBox": "1654,1829,211,62",
"text": "7?.cek"
},
{
"boundingBox": "2146,1824,36,52",
"text": "Y'"
}
]
},
{
"boundingBox": "1603,1889,551,98",
"words": [
{
"boundingBox": "1603,1889,551,98",
"text": "RALEIGH"
}
]
},
{
"boundingBox": "1695,1990,370,37",
"words": [
{
"boundingBox": "1695,1990,79,35",
"text": "1/2"
},
{
"boundingBox": "1794,1993,271,34",
"text": "marathon"
}
]
},
{
"boundingBox": "1742,2052,138,26",
"words": [
{
"boundingBox": "1742,2052,105,23",
"text": "presented"
},
{
"boundingBox": "1856,2053,24,25",
"text": "by"
}
]
},
{
"boundingBox": "1798,2099,156,21",
"words": [
{
"boundingBox": "1798,2099,65,17",
"text": "APRIL"
},
{
"boundingBox": "1872,2101,26,19",
"text": "13,"
},
{
"boundingBox": "1905,2101,49,15",
"text": "2014"
}
]
},
{
"boundingBox": "1597,2160,536,159",
"words": [
{
"boundingBox": "1597,2160,536,159",
"text": "19401"
}
]
},
{
"boundingBox": "1749,2368,101,32",
"words": [
{
"boundingBox": "1749,2368,101,32",
"text": "benefiting"
}
]
}
]
}
]
}
好多了! 有人可能会认为第二张图像难以识别。但是差异,几何图像变换(https://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/geometry/geo-tran.html),尤其是仿射变换对于计算机来说仍然很难掌握。我们的大脑处理成功率非常高。
因此,OCR很擅长识别面向相机的图像,而在使用这种变换的文本图像中很容易失败。