我一直在探索使用AWS Rekognition& amp;获取图像/视频中对象的数量。谷歌的愿景,但一直未能找到出路。虽然在Google's Vision网站上,但它们确实有一个“图像洞察力”部分,显然似乎数量已被捕获。
有人可以建议是否可以使用Google的Vision或任何其他API来帮助获取图像中的对象数量。感谢
编辑:
例如 - 对于下图所示的图像,返回的计数应为10辆。正如Torry Yang在他的回答中所建议的那样,标签Annotations count可以给出所需的数字,但似乎并非如此,因为标签注释的计数是18.返回的对象有点像这样。
"labelAnnotations": [
{
"mid": "/m/0k4j",
"description": "car",
"score": 0.98658943,
"topicality": 0.98658943
},
{
"mid": "/m/012f08",
"description": "motor vehicle",
"score": 0.9631113,
"topicality": 0.9631113
},
{
"mid": "/m/07yv9",
"description": "vehicle",
"score": 0.9223521,
"topicality": 0.9223521
},
{
"mid": "/m/01w71f",
"description": "personal luxury car",
"score": 0.8976857,
"topicality": 0.8976857
},
{
"mid": "/m/068mqj",
"description": "automotive design",
"score": 0.8736646,
"topicality": 0.8736646
},
{
"mid": "/m/012mq4",
"description": "sports car",
"score": 0.8418799,
"topicality": 0.8418799
},
{
"mid": "/m/01lcwm",
"description": "luxury vehicle",
"score": 0.7761523,
"topicality": 0.7761523
},
{
"mid": "/m/06j11d",
"description": "performance car",
"score": 0.76816446,
"topicality": 0.76816446
},
{
"mid": "/m/03vnt4",
"description": "mid size car",
"score": 0.75732976,
"topicality": 0.75732976
},
{
"mid": "/m/03vntj",
"description": "full size car",
"score": 0.6855145,
"topicality": 0.6855145
},
{
"mid": "/m/0h8ls87",
"description": "automotive exterior",
"score": 0.66056395,
"topicality": 0.66056395
},
{
"mid": "/m/014f__",
"description": "supercar",
"score": 0.592226,
"topicality": 0.592226
},
{
"mid": "/m/02swz_",
"description": "compact car",
"score": 0.5807265,
"topicality": 0.5807265
},
{
"mid": "/m/0h6dlrc",
"description": "bmw",
"score": 0.5801241,
"topicality": 0.5801241
},
{
"mid": "/m/01h80k",
"description": "muscle car",
"score": 0.55745816,
"topicality": 0.55745816
},
{
"mid": "/m/021mp2",
"description": "sedan",
"score": 0.5522745,
"topicality": 0.5522745
},
{
"mid": "/m/0369ss",
"description": "city car",
"score": 0.52938646,
"topicality": 0.52938646
},
{
"mid": "/m/01d1dj",
"description": "coupé",
"score": 0.50642073,
"topicality": 0.50642073
}
]
答案 0 :(得分:1)
在Google Cloud Vision上,您应该可以获得点数。例如,如果要使用Python计算面数,可以执行以下操作:
def detect_faces(path):
"""Detects faces in an image."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.face_detection(image=image)
faces = response.face_annotations
print(len(faces))
注意最后一行。在每种支持的语言中,您都应该能够计算结果。
以下是每个标签的计数方法。
def detect_labels(path):
"""Detects labels in the file."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.label_detection(image=image)
labels = response.label_annotations
count = {}
for label in labels:
if label in count:
count[label] += 1
else:
count[label] = 1
在第二个示例中,count将是每个标签的字典以及它在图像中显示的次数。
答案 1 :(得分:0)
Google视觉和 AWS Rekognition 都不支持照片中的对象计数。
https://forums.aws.amazon.com/thread.jspa?threadID=254814
但是,您可以在 Vision 和 Rekognition 中计算图像中的面部数量。
在AWS Rekognition中,您收到 DetectFaces API的响应,为json:
HTTP/1.1 200 OK
Content-Type: application/x-amz-json-1.1
Date: Wed, 04 Jan 2017 23:37:03 GMT
x-amzn-RequestId: b1827570-d2d6-11e6-a51e-73b99a9bb0b9
Content-Length: 1355
Connection: keep-alive
{
"FaceDetails":[
{
"BoundingBox":{
"Height":0.18000000715255737,
"Left":0.5555555820465088,
"Top":0.33666667342185974,
"Width":0.23999999463558197
},
"Confidence":100.0,
"Landmarks":[
{
"Type":"eyeLeft",
"X":0.6394737362861633,
"Y":0.40819624066352844
},
{
"Type":"eyeRight",
"X":0.7266660928726196,
"Y":0.41039225459098816
},
{
"Type":"nose",
"X":0.6912462115287781,
"Y":0.44240960478782654
},
{
"Type":"mouthLeft",
"X":0.6306198239326477,
"Y":0.46700039505958557
},
{
"Type":"mouthRight",
"X":0.7215608954429626,
"Y":0.47114261984825134
}
],
"Pose":{
"Pitch":4.050806522369385,
"Roll":0.9950747489929199,
"Yaw":13.693790435791016
},
"Quality":{
"Brightness":37.60169982910156,
"Sharpness":80.0
}
},
{
"BoundingBox":{
"Height":0.16555555164813995,
"Left":0.3096296191215515,
"Top":0.7066666483879089,
"Width":0.22074073553085327
},
"Confidence":99.99998474121094,
"Landmarks":[
{
"Type":"eyeLeft",
"X":0.3767718970775604,
"Y":0.7863991856575012
},
{
"Type":"eyeRight",
"X":0.4517287313938141,
"Y":0.7715709209442139
},
{
"Type":"nose",
"X":0.42001065611839294,
"Y":0.8192070126533508
},
{
"Type":"mouthLeft",
"X":0.3915625810623169,
"Y":0.8374140858650208
},
{
"Type":"mouthRight",
"X":0.46825936436653137,
"Y":0.823401689529419
}
],
"Pose":{
"Pitch":-16.320178985595703,
"Roll":-15.097439765930176,
"Yaw":-5.771541118621826
},
"Quality":{
"Brightness":31.440860748291016,
"Sharpness":60.000003814697266
}
}
],
"OrientationCorrection":"ROTATE_0"
}
然后您可以使用此响应来计算边界框的数量,这些边界框最终将与图像中的人脸数量相对应。
此外,如果您想对照片中的对象进行计数,则可以在AWS SageMaker上设置自定义机器学习模型。示例:https://github.com/cosmincatalin/object-counting-with-mxnet-and-sagemaker