Question

我正在尝试从纸牌游戏中识别munchkin cards。我一直在尝试使用各种图像识别API（谷歌视觉api，vize.ai，azure的计算机视觉api等），但它们似乎都没有工作正常。
当只有一张卡片出现在演示图像中时，他们能够识别其中一张卡片，但当两张卡片出现在另一张卡片上时，它们无法识别其中一张卡片。
我已经使用一组约40个不同的图像训练API，每张卡具有不同的角度，背景和光照我也试过使用ocr（通过谷歌视觉api）只适用于一些卡，可能是由于小写字母和一些卡上没有太多的细节。有谁知道我可以教这些API（或其他）更好地阅读这些卡的方法？或者以不同的方式识别卡片？

结果应该是用户在玩游戏时捕捉图像并让应用程序了解他在他面前的牌并返回结果。
谢谢。

Answer 1

您可以尝试：https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/csharp#OCR。它将检测文本，然后您可以使用自定义逻辑（基于检测到的文本）来处理操作。

Answer 2

你的方向错了。据我所知，你有一个形象。在该图像中，有几张munchkin卡（在你的例子中为2张）。它不仅仅是“识别”，还需要“卡片检测”。所以你的任务应该分为卡片检测任务和卡片的文本识别任务

对于每项任务，您都可以使用以下算法

1. Card detection task
Simple color segmentation
( if you have enough time and patient, train SSD to detect card)
2. Card's text recognition
Use tesseract with english dictionary
(You could add some card rotating process to improve accuracy)

希望有帮助

Answer 3

太巧了！我最近做了非常相似的事情– link to video –取得了巨大的成功！具体来说，我试图识别并跟踪中文的Munchkin卡，以英语替换它们。我使用了iOS的ARKit 2（需要iPhone 6S或更高版本；或者是相对较新的iPad；台式机不支持）。

我基本上只是在WWDC 2018的What's New in ARKit 2演示文稿的41分钟内观看了增强现实相框演示。我在下面的代码是对它们的较小修改（仅将目标替换为静态图像而不是视频）。繁琐的部分是用两种语言扫描所有卡，将它们裁剪掉，然后将它们添加为AR资源...

这是我的源代码ViewController.swift：

import UIKit
import SceneKit
import ARKit
import Foundation

class ViewController: UIViewController, ARSCNViewDelegate {

    @IBOutlet var sceneView: ARSCNView!

    override func viewDidLoad() {
        super.viewDidLoad()

         var videoPlayer: AVPlayer

        // Set the view's delegate
        sceneView.delegate = self

        // Show statistics such as fps and timing information
        sceneView.showsStatistics = true

        sceneView.scene = SCNScene()
    }

    override func viewWillAppear(_ animated: Bool) {
        super.viewWillAppear(animated)

        // Create a configuration
        let configuration = ARImageTrackingConfiguration()

        guard let trackingImages = ARReferenceImage.referenceImages(inGroupNamed: "card_scans", bundle: Bundle.main) else {
            print("Could not load images")
            return
        }

        // Setup configuration
        configuration.trackingImages = trackingImages
        configuration.maximumNumberOfTrackedImages = 16

        // Run the view's session
        sceneView.session.run(configuration)
    }

    override func viewWillDisappear(_ animated: Bool) {
        super.viewWillDisappear(animated)

        // Pause the view's session
        sceneView.session.pause()
    }

    // MARK: - ARSCNViewDelegate

    // Override to create and configure nodes for anchors added to the view's session.
    public func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {
        let node = SCNNode()

        if let imageAnchor = anchor as? ARImageAnchor {
            // Create a plane
            let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width,
                                 height: imageAnchor.referenceImage.physicalSize.height)

            print("Asset identified as: \(anchor.name ?? "nil")")

            // Set UIImage as the plane's texture
            plane.firstMaterial?.diffuse.contents = UIImage(named:"replacementImage.png")

            let planeNode = SCNNode(geometry: plane)

            // Rotate the plane to match the anchor
            planeNode.eulerAngles.x = -.pi / 2

            node.addChildNode(planeNode)
        }

        return node
    }

    func session(_ session: ARSession, didFailWithError error: Error) {
        // Present an error message to the user

    }

    func sessionWasInterrupted(_ session: ARSession) {
        // Inform the user that the session has been interrupted, for example, by presenting an overlay

    }

    func sessionInterruptionEnded(_ session: ARSession) {
        // Reset tracking and/or remove existing anchors if consistent tracking is required

    }
}

不幸的是，我遇到了一个局限：您添加为AR目标的更多卡要区别开来，因此卡识别变得越来越容易产生误报（澄清：不是同时显示在屏幕上的目标数量，而是潜在目标的库大小）。 9个目标库的成功率达到100％，但并没有扩展到68个目标库（这是Munchkin的所有珍宝卡）。当面对每个目标时，该应用程序倾向于在1-3个潜在猜测之间波动。看到性能不佳，我没有花最后的力气添加所有168张Munchkin卡。

我用中文卡作为目标，都是单色的。我相信如果我将英语卡片用作目标（因为它们是全彩色的，因此具有更丰富的直方图），效果可能会更好，但是在我初步检查了每种语言的9张卡片时，我收到了像我一样，英语对AR资源的警告也难以区分。因此，我认为性能无法提高到可以可靠地扩展到完整的168卡组的水平。

Unity的Vuforia将是实现这一目标的另一种选择，但又有50-100个目标的硬性限制。借助（价格惊人的）商业许可，您可以将目标识别委托给云计算机，这可能是该方法的可行途径。

感谢您调查OCR和ML方法-它们将是我的下一个咨询对象。如果您发现任何其他有希望的方法，请在此处留言！

识别图像中的扑克牌

3 个答案: