Question

我正在尝试通过相机使用ML-Kit图像标签检测手里拿着的物品。例如，如果我给它显示一个苏打水，它可以捡起诸如手，脸，背景等之类的东西。。。我不感兴趣的东西甚至在0.25分钟后都找不到手中的对象。使用云视觉实现准确性。

有没有一种方法可以限制视觉范围，或者可以提高准确性？

PS：如果此任务有更好的选择，我也愿意切换API。

//This is mostly from a google tutorial 
private fun runCloudImageLabeling(bitmap: Bitmap) {
    //Create a FirebaseVisionImage
    val image = FirebaseVisionImage.fromBitmap(bitmap)

    val detector = FirebaseVision.getInstance().visionCloudLabelDetector

    //Use the detector to detect the labels inside the image
    detector.detectInImage(image)
            .addOnSuccessListener {
                // Task completed successfully
                progressBar.visibility = View.GONE
                itemAdapter.setList(it)
                sheetBehavior.setState(BottomSheetBehavior.STATE_EXPANDED)
            }
            .addOnFailureListener {
                // Task failed with an exception
                progressBar.visibility = View.GONE
                Toast.makeText(baseContext, "Sorry, something went wrong!", Toast.LENGTH_SHORT).show()
            }
}

能够高精度地检测手中的东西。

Answer 1

Firebase ML Kit使用的内置对象检测模型中没有设置可以控制准确性的设置。

如果要进行更准确的检测，有两种选择：

调用Cloud Vision，它是服务器端API，可以检测更多的对象类别，并且通常具有更高的准确性。这是一个付费的API，但确实有免费配额。这是comparison page in the documentation的详细信息。
训练自己的模型，使其更适合您所关注的图像类型。然后，您可以在应用中use this custom model来获得更好的准确性。

Answer 2

ML Kit提供了Object Detection & Tracking API，可用于定位对象。

该API允许您对突出的对象（靠近取景器的中心）进行过滤，在您的示例中这就是苏打水。该API返回对象周围的边界框，您可以使用该边界框进行裁剪并随后通过Image Labeling API将其馈入。这样，您就可以过滤掉所有无关的背景和/或其他对象。

改善ML Kit图像标签

2 个答案: