如何使用RGB Image作为C#EvalDll Wrapper的输入?

时间:2016-05-18 13:21:52

标签: cntk

我使用提供的ImageReader训练了一个网络,现在,我正在尝试在C#项目中使用CNTK EvalDll来评估RGB图像。

我见过与EvalDll相关的示例,但输入始终是float / double数组,而不是图像。

如何使用公开的界面将受过训练的网络与RGB图像一起使用?

2 个答案:

答案 0 :(得分:2)

我假设您希望与features=[ width=224 height=224 channels=3 cropType=Center ] 相同的阅读,其中您的阅读器配置类似于

System.Drawing.Bitmap

您需要帮助函数来创建裁剪,并将图像重新调整为网络接受的大小。

我将定义2个open System.Collections.Generic open System.Drawing open System.Drawing.Drawing2D open System.Drawing.Imaging type Bitmap with /// Crops the image in the present object, starting at the given (column, row), and retaining /// the given number of columns and rows. member this.Crop(column, row, numCols, numRows) = let rect = Rectangle(column, row, numCols, numRows) this.Clone(rect, this.PixelFormat) /// Creates a resized version of the present image. The returned image /// will have the given width and height. This may distort the aspect ratio /// of the image. member this.ResizeImage(width, height, useHighQuality) = // Rather than using image.GetThumbnailImage, use direct image resizing. // GetThumbnailImage throws odd out-of-memory exceptions on some // images, see also // http://stackoverflow.com/questions/27528057/c-sharp-out-of-memory-exception-in-getthumbnailimage-on-a-server // Use the interpolation method suggested on // http://stackoverflow.com/questions/1922040/resize-an-image-c-sharp let rect = Rectangle(0, 0, width, height); let destImage = new Bitmap(width, height); destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution); use graphics = Graphics.FromImage destImage graphics.CompositingMode <- CompositingMode.SourceCopy; if useHighQuality then graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic graphics.CompositingQuality <- CompositingQuality.HighQuality graphics.SmoothingMode <- SmoothingMode.HighQuality graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality else graphics.InterpolationMode <- InterpolationMode.Low use wrapMode = new ImageAttributes() wrapMode.SetWrapMode WrapMode.TileFlipXY graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode) destImage 的扩展方法,一个用于裁剪,另一个用于重新调整大小:

/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The 
/// aspect ratio is preserved.
let CenterCrop cropRatio (image: Bitmap) =
    let cropSize = 
        float(min image.Height image.Width) * cropRatio
        |> int
    let startRow = (image.Height - cropSize) / 2
    let startCol = (image.Width - cropSize) / 2
    image.Crop(startCol, startRow, cropSize, cropSize)

基于此,定义一个执行中心裁剪的函数:

/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor.
/// Returns a list with targetSize*targetSize*3 values.
let ImageToFeatures (image: Bitmap, targetSize) =
    // Apply the same image pre-processing that is typically done
    // in CNTK when running it in test or write mode: Take a center
    // crop of the image, then re-size it to the network input size.
    let cropped = CenterCrop 1.0 image
    let resized = cropped.ResizeImage(targetSize, targetSize, false)
    // Ensure that the initial capacity of the list is provided 
    // with the constructor. Creating the list via the default constructor
    // makes the whole operation 20% slower.
    let features = List (targetSize * targetSize * 3)
    // Traverse the image in the format that is used in OpenCV:
    // First the B plane, then the G plane, R plane
    for c in 0 .. 2 do
        for h in 0 .. (resized.Height - 1) do
            for w in 0 .. (resized.Width - 1) do
                let pixel = resized.GetPixel(w, h)
                let v = 
                    match c with 
                    | 0 -> pixel.B
                    | 1 -> pixel.G
                    | 2 -> pixel.R
                    | _ -> failwith "No such channel"
                    |> float32
                features.Add v
    features

然后将它们全部插入:裁剪,调整大小,然后按照OpenCV使用的平面顺序遍历图像:

ImageToFeatures

使用相关图片致电IEvaluateModelManagedF,将结果反馈到myImage的实例,然后您就可以了。我假设您的RGB图像位于let LoadModelOnCpu modelPath = let model = new IEvaluateModelManagedF() let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath model.Init description model.CreateNetwork description model let model = LoadModelOnCpu("myModelFile") let featureDict = Dictionary() featureDict.["features"] <- ImageToFeatures(myImage, 224) model.Evaluate(featureDict, "OutputNodes.z", 2) ,并且您正在进行二进制分类,网络大小为224 x 224.

  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  ImageIO.write(bufferedImage, "png", baos);
  baos.flush();
  byte[] dataToReturn = baos.toByteArray();

答案 1 :(得分:2)

我在C#中实现了类似的代码,它在模型中加载,读取测试图像,进行适当的裁剪/缩放/等,并运行模型。正如安东所指出的那样,输出与CNTK的输出不完全匹配,但非常接近。

图像读取/裁剪/缩放代码:

    private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
    {
        var rect = new Rectangle(col, row, numCols, numRows);
        return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
    }

    /// Returns a square sub-image from the center of the given image, with
    /// a size that is cropRatio times the smallest image dimension. The 
    /// aspect ratio is preserved.
    private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
    {
        var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
        var startCol = (img.Width - cropSize) / 2;
        var startRow = (img.Height - cropSize) / 2;
        return ImCrop(img, startCol, startRow, cropSize, cropSize);
    }

    /// Creates a resized version of the present image. The returned image
    /// will have the given width and height. This may distort the aspect ratio
    /// of the image.
    private static Bitmap ImResize(Bitmap img, int width, int height)
    {
        return new Bitmap(img, new Size(width, height));
    }

加载模型的代码和包含像素的xml文件意味着:

    public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
    {
        var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
        Stopwatch stopWatch = new Stopwatch();
        var model = new IEvaluateModelManagedF();
        model.CreateNetwork(networkConfiguration, deviceId: -1);
        stopWatch.Stop();
        Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
        return model;
    }

    /// Read the xml mean file, i.e. the offsets which are substracted
    /// from each pixel in an image before using it as input to a CNTK model.
    public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
    {
        // Read and parse pixel value xml file
        XmlTextReader reader = new XmlTextReader(XmlPath);
        reader.ReadToFollowing("data");
        reader.Read();
        var pixelMeansXml =
            reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
                .Select(Single.Parse)
                .ToArray();

        // Re-order mean pixel values to be in the same order as the bitmap
        // image (as outputted by the getRGBChannels() function).
        int inputDim = 3 * ImgWidth * ImgHeight;
        Debug.Assert(pixelMeansXml.Length == inputDim);
        var pixelMeans = new float[inputDim];
        int counter = 0;
        for (int c = 0; c < 3; c++)
            for (int h = 0; h < ImgHeight; h++)
                for (int w = 0; w < ImgWidth; w++)
                {
                    int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
                    pixelMeans[counter++] = pixelMeansXml[xmlIndex];
                }
        return pixelMeans;
    }

加载图像并转换为模型输入的代码:

    /// Creates a list of CNTK feature values from a given bitmap.
    /// The image is first resized to fit into an (targetSize x targetSize) bounding box,
    /// then the image planes are converted to a CNTK tensor, and the mean 
    /// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
    private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
    {
        // Apply the same image pre-processing that is done typically in CNTK:
        // Take a center crop of the image, then re-size it to the network input size.
        var imgCropped = ImCropToCenter(img, 1.0);
        var imgResized = ImResize(imgCropped, targetSize, targetSize);

        // Convert pixels to CNTK model input.
        // Fast pixel extraction is ~5 faster while giving identical output
        var features = new float[3 * imgResized.Height * imgResized.Width];
        var boFastPixelExtraction = true; 
        if (boFastPixelExtraction) 
        {
            var pixelsRGB = ImGetRGBChannels(imgResized);
            for (int c = 0; c < 3; c++)
            {
                byte[] pixels = pixelsRGB[2 - c];
                Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
                for (int i = 0; i < pixels.Length; i++)
                {
                    int featIndex = i + c * pixels.Length;
                    features[featIndex] = pixels[i] - pixelMeans[featIndex];
                }
            }
        }
        else
        {
            // Traverse the image in the format that is used in OpenCV:
            // First the B plane, then the G plane, R plane
            // Note: calling GetPixel(w, h) repeatedly is slow!
            int featIndex = 0;
            for (int c = 0; c < 3; c++)
                for (int h = 0; h < imgResized.Height; h++)
                    for (int w = 0; w < imgResized.Width; w++)
                    {
                        var pixel = imgResized.GetPixel(w, h);
                        float v;
                        if (c == 0)
                            v = pixel.B;
                        else if (c == 1)
                            v = pixel.G;
                        else if (c == 2)
                            v = pixel.R;
                        else
                            throw new Exception("");

                        // Substract pixel mean                                                                                           
                        features[featIndex] = v - pixelMeans[featIndex];
                        featIndex++;
                    }
        }  
        return features.ToList();
    }

    /// Convert bitmap image to R,G,B channel byte arrays.
    /// See: http://stackoverflow.com/questions/6020406/travel-through-pixels-in-bmp
    private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
    {
        // Lock the bitmap's bits.  
        Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
        BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);

        // Declare an array to hold the bytes of the bitmap.
        int bytes = bmpData.Stride * bmp.Height;
        byte[] rgbValues = new byte[bytes];
        byte[] r = new byte[bytes / 3];
        byte[] g = new byte[bytes / 3];
        byte[] b = new byte[bytes / 3];

        // Copy the RGB values into the array, starting from ptr to the first line
        IntPtr ptr = bmpData.Scan0;
        Marshal.Copy(ptr, rgbValues, 0, bytes);

        // Populate byte arrays
        int count = 0;
        int stride = bmpData.Stride;
        for (int col = 0; col < bmpData.Height; col++)
        {
            for (int row = 0; row < bmpData.Width; row++)
            {
                int offset = (col * stride) + (row * 3);
                b[count] = rgbValues[offset];
                g[count] = rgbValues[offset + 1];
                r[count++] = rgbValues[offset + 2];
            }
        }
        bmp.UnlockBits(bmpData);
        return new List<byte[]> { r, g, b };
    }