如何提高iOS上Tensorflow相机演示的重新训练精度

时间:2017-09-18 17:15:12

标签: android ios objective-c machine-learning tensorflow

我有一个Android应用程序,模仿Tensorflow Android演示版,用于分类图像,

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android

原始应用程序使用张量流图(.pb)文件对Inception v3中的一组通用图像进行分类(我认为)

然后按照Tensorflow for Poets博客中的说明,我为自己的图像训练了自己的图形,

https://petewarden.com/2016/02/28/tensorflow-for-poets/

在更改设置后,这在Android应用中非常有效,

ClassifierActivity

private static final int INPUT_SIZE = 299;
private static final int IMAGE_MEAN = 128;
private static final float IMAGE_STD = 128.0f;
private static final String INPUT_NAME = "Mul";
private static final String OUTPUT_NAME = "final_result";
private static final String MODEL_FILE = "file:///android_asset/optimized_graph.pb";
private static final String LABEL_FILE =  "file:///android_asset/retrained_labels.txt";

要将应用程序移植到iOS,我随后使用iOS相机演示, https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/ios/camera

并使用相同的图形文件并更改了

中的设置

CameraExampleViewController.mm

// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = false;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 299;
const int wanted_input_height = 299;
const int wanted_input_channels = 3;
const float input_mean = 128f;
const float input_std = 128.0f;
const std::string input_layer_name = "Mul";
const std::string output_layer_name = "final_result";

在此之后,该应用程序正在iOS上运行,但是......

Android上的应用程序在检测机密图像方面比iOS执行得更好。 如果我用图像填充相机的视口,两者都表现相似。但通常情况下,要检测的图像只是摄像头视图端口的一部分,在Android上这似乎影响不大,但在iOS上影响很大,因此iOS无法对图像进行分类。

我的猜测是Android正在裁剪,如果相机视图端口到中央299x299区域,那么iOS正在将其摄像机视口缩放到中央299x299区域。

任何人都可以证实吗? 有没有人知道如何修复iOS演示以更好地检测聚焦图像? (让它裁剪)

在演示Android类中,

ClassifierActivity.onPreviewSizeChosen()

rgbFrameBitmap = Bitmap.createBitmap(previewWidth, previewHeight, Config.ARGB_8888);
    croppedBitmap = Bitmap.createBitmap(INPUT_SIZE, INPUT_SIZE, Config.ARGB_8888);

frameToCropTransform =
        ImageUtils.getTransformationMatrix(
            previewWidth, previewHeight,
            INPUT_SIZE, INPUT_SIZE,
            sensorOrientation, MAINTAIN_ASPECT);

cropToFrameTransform = new Matrix();
frameToCropTransform.invert(cropToFrameTransform);

在iOS上有,

CameraExampleViewController.runCNNOnFrame()

const int sourceRowBytes = (int)CVPixelBufferGetBytesPerRow(pixelBuffer);
  const int image_width = (int)CVPixelBufferGetWidth(pixelBuffer);
  const int fullHeight = (int)CVPixelBufferGetHeight(pixelBuffer);

  CVPixelBufferLockFlags unlockFlags = kNilOptions;
  CVPixelBufferLockBaseAddress(pixelBuffer, unlockFlags);

  unsigned char *sourceBaseAddr =
      (unsigned char *)(CVPixelBufferGetBaseAddress(pixelBuffer));
  int image_height;
  unsigned char *sourceStartAddr;
  if (fullHeight <= image_width) {
    image_height = fullHeight;
    sourceStartAddr = sourceBaseAddr;
  } else {
    image_height = image_width;
    const int marginY = ((fullHeight - image_width) / 2);
    sourceStartAddr = (sourceBaseAddr + (marginY * sourceRowBytes));
  }
  const int image_channels = 4;

  assert(image_channels >= wanted_input_channels);
  tensorflow::Tensor image_tensor(
      tensorflow::DT_FLOAT,
      tensorflow::TensorShape(
          {1, wanted_input_height, wanted_input_width, wanted_input_channels}));
  auto image_tensor_mapped = image_tensor.tensor<float, 4>();
  tensorflow::uint8 *in = sourceStartAddr;
  float *out = image_tensor_mapped.data();
  for (int y = 0; y < wanted_input_height; ++y) {
    float *out_row = out + (y * wanted_input_width * wanted_input_channels);
    for (int x = 0; x < wanted_input_width; ++x) {
      const int in_x = (y * image_width) / wanted_input_width;
      const int in_y = (x * image_height) / wanted_input_height;
      tensorflow::uint8 *in_pixel =
          in + (in_y * image_width * image_channels) + (in_x * image_channels);
      float *out_pixel = out_row + (x * wanted_input_channels);
      for (int c = 0; c < wanted_input_channels; ++c) {
        out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
      }
    }
  }

  CVPixelBufferUnlockBaseAddress(pixelBuffer, unlockFlags);

我认为问题出在这里,

tensorflow::uint8 *in_pixel =
          in + (in_y * image_width * image_channels) + (in_x * image_channels);
      float *out_pixel = out_row + (x * wanted_input_channels);

我的理解是这只是通过选择每个第x个像素而不是将原始图像缩放到299大小来缩放到299大小。因此,这会导致缩放不良和图像识别不良。

解决方案是首先将pixelBuffer缩放到299。 我试过这个,

UIImage *uiImage = [self uiImageFromPixelBuffer: pixelBuffer];
float scaleFactor = (float)wanted_input_height / (float)fullHeight;
float newWidth = image_width * scaleFactor;
NSLog(@"width: %d, height: %d, scale: %f, height: %f", image_width, fullHeight, scaleFactor, newWidth);
CGSize size = CGSizeMake(wanted_input_width, wanted_input_height);
UIGraphicsBeginImageContext(size);
[uiImage drawInRect:CGRectMake(0, 0, newWidth, size.height)];
UIImage *destImage = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();
pixelBuffer = [self pixelBufferFromCGImage: destImage.CGImage];

并将图像转换为pixle缓冲区,

- (CVPixelBufferRef) pixelBufferFromCGImage: (CGImageRef) image
{
    NSDictionary *options = @{
                              (NSString*)kCVPixelBufferCGImageCompatibilityKey : @YES,
                              (NSString*)kCVPixelBufferCGBitmapContextCompatibilityKey : @YES,
                              };

    CVPixelBufferRef pxbuffer = NULL;
    CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault, CGImageGetWidth(image),
                                          CGImageGetHeight(image), kCVPixelFormatType_32ARGB, (__bridge CFDictionaryRef) options,
                                          &pxbuffer);
    if (status!=kCVReturnSuccess) {
        NSLog(@"Operation failed");
    }
    NSParameterAssert(status == kCVReturnSuccess && pxbuffer != NULL);

    CVPixelBufferLockBaseAddress(pxbuffer, 0);
    void *pxdata = CVPixelBufferGetBaseAddress(pxbuffer);

    CGColorSpaceRef rgbColorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef context = CGBitmapContextCreate(pxdata, CGImageGetWidth(image),
                                                 CGImageGetHeight(image), 8, 4*CGImageGetWidth(image), rgbColorSpace,
                                                 kCGImageAlphaNoneSkipFirst);
    NSParameterAssert(context);

    CGContextConcatCTM(context, CGAffineTransformMakeRotation(0));
    CGAffineTransform flipVertical = CGAffineTransformMake( 1, 0, 0, -1, 0, CGImageGetHeight(image) );
    CGContextConcatCTM(context, flipVertical);
    CGAffineTransform flipHorizontal = CGAffineTransformMake( -1.0, 0.0, 0.0, 1.0, CGImageGetWidth(image), 0.0 );
    CGContextConcatCTM(context, flipHorizontal);

    CGContextDrawImage(context, CGRectMake(0, 0, CGImageGetWidth(image),
                                           CGImageGetHeight(image)), image);
    CGColorSpaceRelease(rgbColorSpace);
    CGContextRelease(context);

    CVPixelBufferUnlockBaseAddress(pxbuffer, 0);
    return pxbuffer;
}

- (UIImage*) uiImageFromPixelBuffer: (CVPixelBufferRef) pixelBuffer {
    CIImage *ciImage = [CIImage imageWithCVPixelBuffer: pixelBuffer];

    CIContext *temporaryContext = [CIContext contextWithOptions:nil];
    CGImageRef videoImage = [temporaryContext
                             createCGImage:ciImage
                             fromRect:CGRectMake(0, 0,
                                                 CVPixelBufferGetWidth(pixelBuffer),
                                                 CVPixelBufferGetHeight(pixelBuffer))];

    UIImage *uiImage = [UIImage imageWithCGImage:videoImage];
    CGImageRelease(videoImage);
    return uiImage;
}

不确定这是否是调整大小的最佳方法,但这有效。 但它似乎使图像分类更糟糕,而不是更好......

图片转换/调整大小的任何想法或问题?

3 个答案:

答案 0 :(得分:6)

由于您未使用YOLO Detector,MAINTAIN_ASPECT标志设置为false。因此,Android应用上的图像不会被裁剪,但它会缩放。但是,在提供的代码片段中,我没有看到标志的实际初始化。确认您的应用中该标志的值实际为false

我知道这不是一个完整的解决方案,但希望这可以帮助您调试问题。

答案 1 :(得分:2)

Tensorflow对象检测具有默认和标准配置,下面是设置列表,

您需要根据输入的ML模型检查的重要事项

-> model_file_name-根据您的.pb文件名,

-> model_uses_memory_mapping-由您决定减少总体内存使用量。

-> labels_file_name-这取决于我们的标签文件名

-> input_layer_name / output_layer_name-确保您使用在创建graph(.pb)文件时使用的自己的图层输入/输出名称。

代码段

// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"graph";//@"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = true;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"labels";//@"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 224;
const int wanted_input_height = 224;
const int wanted_input_channels = 3;
const float input_mean = 117.0f;
const float input_std = 1.0f;
const std::string input_layer_name = "input";
const std::string output_layer_name = "final_result";

自定义图像Tensorflow检测,您可以使用以下工作片段:

->对于此过程,您只需传递UIImage.CGImage对象,

NSString* RunInferenceOnImageResult(CGImageRef image) {
    tensorflow::SessionOptions options;

    tensorflow::Session* session_pointer = nullptr;
    tensorflow::Status session_status = tensorflow::NewSession(options, &session_pointer);
    if (!session_status.ok()) {
        std::string status_string = session_status.ToString();
        return [NSString stringWithFormat: @"Session create failed - %s",
                status_string.c_str()];
    }
    std::unique_ptr<tensorflow::Session> session(session_pointer);
    LOG(INFO) << "Session created.";

    tensorflow::GraphDef tensorflow_graph;
    LOG(INFO) << "Graph created.";

    NSString* network_path = FilePathForResourceNames(@"tensorflow_inception_graph", @"pb");
    PortableReadFileToProtol([network_path UTF8String], &tensorflow_graph);

    LOG(INFO) << "Creating session.";
    tensorflow::Status s = session->Create(tensorflow_graph);
    if (!s.ok()) {
        LOG(ERROR) << "Could not create TensorFlow Graph: " << s;
        return @"";
    }

    // Read the label list
    NSString* labels_path = FilePathForResourceNames(@"imagenet_comp_graph_label_strings", @"txt");
    std::vector<std::string> label_strings;
    std::ifstream t;
    t.open([labels_path UTF8String]);
    std::string line;
    while(t){
        std::getline(t, line);
        label_strings.push_back(line);
    }
    t.close();

    // Read the Grace Hopper image.
    //NSString* image_path = FilePathForResourceNames(@"grace_hopper", @"jpg");
    int image_width;
    int image_height;
    int image_channels;
//    std::vector<tensorflow::uint8> image_data = LoadImageFromFile(
//                                                                  [image_path UTF8String], &image_width, &image_height, &image_channels);
    std::vector<tensorflow::uint8> image_data = LoadImageFromImage(image,&image_width, &image_height, &image_channels);
    const int wanted_width = 224;
    const int wanted_height = 224;
    const int wanted_channels = 3;
    const float input_mean = 117.0f;
    const float input_std = 1.0f;
    assert(image_channels >= wanted_channels);
    tensorflow::Tensor image_tensor(
                                    tensorflow::DT_FLOAT,
                                    tensorflow::TensorShape({
        1, wanted_height, wanted_width, wanted_channels}));
    auto image_tensor_mapped = image_tensor.tensor<float, 4>();
    tensorflow::uint8* in = image_data.data();
    // tensorflow::uint8* in_end = (in + (image_height * image_width * image_channels));
    float* out = image_tensor_mapped.data();
    for (int y = 0; y < wanted_height; ++y) {
        const int in_y = (y * image_height) / wanted_height;
        tensorflow::uint8* in_row = in + (in_y * image_width * image_channels);
        float* out_row = out + (y * wanted_width * wanted_channels);
        for (int x = 0; x < wanted_width; ++x) {
            const int in_x = (x * image_width) / wanted_width;
            tensorflow::uint8* in_pixel = in_row + (in_x * image_channels);
            float* out_pixel = out_row + (x * wanted_channels);
            for (int c = 0; c < wanted_channels; ++c) {
                out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
            }
        }
    }

    NSString* result;
//    result = [NSString stringWithFormat: @"%@ - %lu, %s - %dx%d", result,
//              label_strings.size(), label_strings[0].c_str(), image_width, image_height];

    std::string input_layer = "input";
    std::string output_layer = "output";
    std::vector<tensorflow::Tensor> outputs;
    tensorflow::Status run_status = session->Run({{input_layer, image_tensor}},
                                                 {output_layer}, {}, &outputs);
    if (!run_status.ok()) {
        LOG(ERROR) << "Running model failed: " << run_status;
        tensorflow::LogAllRegisteredKernels();
        result = @"Error running model";
        return result;
    }
    tensorflow::string status_string = run_status.ToString();
    result = [NSString stringWithFormat: @"Status :%s\n",
              status_string.c_str()];

    tensorflow::Tensor* output = &outputs[0];
    const int kNumResults = 5;
    const float kThreshold = 0.1f;
    std::vector<std::pair<float, int> > top_results;
    GetTopN(output->flat<float>(), kNumResults, kThreshold, &top_results);

    std::stringstream ss;
    ss.precision(3);
    for (const auto& result : top_results) {
        const float confidence = result.first;
        const int index = result.second;

        ss << index << " " << confidence << "  ";

        // Write out the result as a string
        if (index < label_strings.size()) {
            // just for safety: theoretically, the output is under 1000 unless there
            // is some numerical issues leading to a wrong prediction.
            ss << label_strings[index];
        } else {
            ss << "Prediction: " << index;
        }

        ss << "\n";
    }

    LOG(INFO) << "Predictions: " << ss.str();

    tensorflow::string predictions = ss.str();
    result = [NSString stringWithFormat: @"%@ - %s", result,
              predictions.c_str()];

    return result;
}

缩放图像以自定义宽度和高度-C ++代码段,

std::vector<uint8> LoadImageFromImage(CGImageRef image,
                                     int* out_width, int* out_height,
                                     int* out_channels) {

    const int width = (int)CGImageGetWidth(image);
    const int height = (int)CGImageGetHeight(image);
    const int channels = 4;
    CGColorSpaceRef color_space = CGColorSpaceCreateDeviceRGB();
    const int bytes_per_row = (width * channels);
    const int bytes_in_image = (bytes_per_row * height);
    std::vector<uint8> result(bytes_in_image);
    const int bits_per_component = 8;
    CGContextRef context = CGBitmapContextCreate(result.data(), width, height,
                                                 bits_per_component, bytes_per_row, color_space,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(color_space);
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), image);
    CGContextRelease(context);
    CFRelease(image);

    *out_width = width;
    *out_height = height;
    *out_channels = channels;
    return result;
}

以上功能可帮助您根据自定义比率加载图像数据。在张量流分类期间,宽度和高度的高精度图像像素比率为224 x 224。

您需要从RunInferenceOnImageResult调用上面的LoadImage函数,并带有实际的自定义宽度和高度参数以及图像参考。

答案 2 :(得分:1)

请更改此代码:

// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = false;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 299;
const int wanted_input_height = 299;
const int wanted_input_channels = 3;
const float input_mean = 128f;
const float input_std = 1.0f;
const std::string input_layer_name = "Mul";
const std::string output_layer_name = "final_result";

此处更改:const float input_std = 1.0f;