Question

您好我正在尝试进行一些图像处理。我使用Microsoft Kinect来检测房间里的人。我得到深度数据，做一些背景减法工作，当一个人进入场景并四处走动时，最终得到这样的视频序列：

http://www.screenr.com/h7f8

我放了一段视频，以便您可以看到视频中噪音的行为。不同的颜色代表不同的深度。白色代表空。你可以看到它非常嘈杂，尤其是红色噪音。

我需要尽可能地摆脱人类以外的一切。当我进行侵蚀/扩张（使用非常大的窗口大小）时，我可以摆脱很多噪音，但我想知道是否还有其他方法可以使用。特别是视频中的红噪声很难通过侵蚀/扩张来消除。

一些注意事项：

1）如果我们知道场景中没有人类但我们所做的背景减法是完全自动的，即使场景中有人类，甚至移动相机时，也可以进行更好的背景减法所以这是我们现在可以得到的最好的背景减法。

2）该算法将在嵌入式系统上实时工作。因此，算法越有效，越容易。而且它不一定是完美的。虽然也欢迎复杂的信号处理技术（也许我们可能会在另一个不需要嵌入式实时处理的项目中使用它们。）

3）我不需要实际的代码。只是想法。

Answer 1

只是我的两分钱：

如果您不介意使用SDK，那么您可以使用PlayerIndexBitmask非常容易地保留人物像素，如Outlaw Lemur所示。

现在您可能不希望对驱动程序可靠并希望在图像处理级别执行此操作。我们在项目中尝试过并且工作得很好的方法是基于轮廓的。我们从背景减法开始，然后我们检测到图像中最大的轮廓，假设这是人（因为通常剩下的噪声是非常小的斑点），我们填充该轮廓并保持这一点。你也可以使用某种中值滤波作为第一遍。

当然，这并非完美也不适用于所有情况，并且可能有更好的方法。但我只是把它扔出去，以防它提出任何想法。

Answer 2

查看eyesweb。

这是一个支持kinect设备的设计平台，您可以在输出上应用噪声滤波器。它是multimodal系统设计的一个非常有用和简单的工具。

Answer 3

假设您使用的是Kinect SDK，这非常简单。我会关注this视频深度基础知识，并执行以下操作：

    private byte[] GenerateColoredBytes(DepthImageFrame depthFrame)
    {

        //get the raw data from kinect with the depth for every pixel
        short[] rawDepthData = new short[depthFrame.PixelDataLength];
        depthFrame.CopyPixelDataTo(rawDepthData); 

        //use depthFrame to create the image to display on-screen
        //depthFrame contains color information for all pixels in image
        //Height x Width x 4 (Red, Green, Blue, empty byte)
        Byte[] pixels = new byte[depthFrame.Height * depthFrame.Width * 4];

        //Bgr32  - Blue, Green, Red, empty byte
        //Bgra32 - Blue, Green, Red, transparency 
        //You must set transparency for Bgra as .NET defaults a byte to 0 = fully transparent

        //hardcoded locations to Blue, Green, Red (BGR) index positions       
        const int BlueIndex = 0;
        const int GreenIndex = 1;
        const int RedIndex = 2;


        //loop through all distances
        //pick a RGB color based on distance
        for (int depthIndex = 0, colorIndex = 0; 
            depthIndex < rawDepthData.Length && colorIndex < pixels.Length; 
            depthIndex++, colorIndex += 4)
        {
            //get the player (requires skeleton tracking enabled for values)
            int player = rawDepthData[depthIndex] & DepthImageFrame.PlayerIndexBitmask;

            //gets the depth value
            int depth = rawDepthData[depthIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth;

            //.9M or 2.95'
            if (depth <= 900)
            {
                //we are very close
                pixels[colorIndex + BlueIndex] = Colors.White.B;
                pixels[colorIndex + GreenIndex] = Colors.White.G;
                pixels[colorIndex + RedIndex] = Colors.White.R;
            }
            // .9M - 2M or 2.95' - 6.56'
            else if (depth > 900 && depth < 2000)
            {
                //we are a bit further away
                pixels[colorIndex + BlueIndex] = Colors.White.B;
                pixels[colorIndex + GreenIndex] = Colors.White.G;
                pixels[colorIndex + RedIndex] = Colors.White.R;
            }
            // 2M+ or 6.56'+
            else if (depth > 2000)
            {
                //we are the farthest
                pixels[colorIndex + BlueIndex] = Colors.White.B;
                pixels[colorIndex + GreenIndex] = Colors.White.G;
                pixels[colorIndex + RedIndex] = Colors.White.R;
            }


            ////equal coloring for monochromatic histogram
            //byte intensity = CalculateIntensityFromDepth(depth);
            //pixels[colorIndex + BlueIndex] = intensity;
            //pixels[colorIndex + GreenIndex] = intensity;
            //pixels[colorIndex + RedIndex] = intensity;


            //Color all players "gold"
            if (player > 0)
            {
                pixels[colorIndex + BlueIndex] = Colors.Gold.B;
                pixels[colorIndex + GreenIndex] = Colors.Gold.G;
                pixels[colorIndex + RedIndex] = Colors.Gold.R;
            }

        }


        return pixels;
    }

除了人类之外，这一切都变成了白色，人类就是金子。希望这有帮助！

修改

我知道你并不一定想要代码只是想法，所以我会说找到一个找到深度的算法，一个找到人类数量的算法，并将除了人类之外的所有颜色都染成白色。我提供了所有这些，但我不知道你是否知道发生了什么。我也有最终节目的形象。

注意：我为透视添加了第二个深度框

Answer 4

我可能错了（我需要视频而不进行处理）但我倾向于说你正试图摆脱光照变化。

这使得人们在“真正的”环境中检测起来非常困难。

您可以查看this other SO question的某些链接。

我曾经以与你相同的配置实时检测人类，但是单目视觉。在我的例子中，一个非常好的描述符是LBPs，它主要用于纹理分类。这很容易付诸实践（整个网络都有实施）。

LBP，其中主要用于定义检测到移动的感兴趣区域，以便我只能处理部分图像并消除所有噪声。

本文例如使用LBP对图像进行灰度校正。

希望带来一些新想法。

如何从此视频序列中删除噪音？

4 个答案: