我有一个Android应用程序,模仿Tensorflow Android演示版,用于分类图像,
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android
原始应用程序使用张量流图(.pb)文件对Inception v3中的一组通用图像进行分类(我认为)
然后按照Tensorflow for Poets博客中的说明,我为自己的图像训练了自己的图形,
https://petewarden.com/2016/02/28/tensorflow-for-poets/
在更改设置后,这在Android应用中非常有效,
ClassifierActivity
private static final int INPUT_SIZE = 299;
private static final int IMAGE_MEAN = 128;
private static final float IMAGE_STD = 128.0f;
private static final String INPUT_NAME = "Mul";
private static final String OUTPUT_NAME = "final_result";
private static final String MODEL_FILE = "file:///android_asset/optimized_graph.pb";
private static final String LABEL_FILE = "file:///android_asset/retrained_labels.txt";
要将应用程序移植到iOS,我随后使用iOS相机演示, https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/ios/camera
并使用相同的图形文件并更改了
中的设置CameraExampleViewController.mm
// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = false;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 299;
const int wanted_input_height = 299;
const int wanted_input_channels = 3;
const float input_mean = 128f;
const float input_std = 128.0f;
const std::string input_layer_name = "Mul";
const std::string output_layer_name = "final_result";
在此之后,该应用程序正在iOS上运行,但是......
Android上的应用程序在检测机密图像方面比iOS执行得更好。 如果我用图像填充相机的视口,两者都表现相似。但通常情况下,要检测的图像只是摄像头视图端口的一部分,在Android上这似乎影响不大,但在iOS上影响很大,因此iOS无法对图像进行分类。
我的猜测是Android正在裁剪,如果相机视图端口到中央299x299区域,那么iOS正在将其摄像机视口缩放到中央299x299区域。
任何人都可以证实吗? 有没有人知道如何修复iOS演示以更好地检测聚焦图像? (让它裁剪)
在演示Android类中,
ClassifierActivity.onPreviewSizeChosen()
rgbFrameBitmap = Bitmap.createBitmap(previewWidth, previewHeight, Config.ARGB_8888);
croppedBitmap = Bitmap.createBitmap(INPUT_SIZE, INPUT_SIZE, Config.ARGB_8888);
frameToCropTransform =
ImageUtils.getTransformationMatrix(
previewWidth, previewHeight,
INPUT_SIZE, INPUT_SIZE,
sensorOrientation, MAINTAIN_ASPECT);
cropToFrameTransform = new Matrix();
frameToCropTransform.invert(cropToFrameTransform);
在iOS上有,
CameraExampleViewController.runCNNOnFrame()
const int sourceRowBytes = (int)CVPixelBufferGetBytesPerRow(pixelBuffer);
const int image_width = (int)CVPixelBufferGetWidth(pixelBuffer);
const int fullHeight = (int)CVPixelBufferGetHeight(pixelBuffer);
CVPixelBufferLockFlags unlockFlags = kNilOptions;
CVPixelBufferLockBaseAddress(pixelBuffer, unlockFlags);
unsigned char *sourceBaseAddr =
(unsigned char *)(CVPixelBufferGetBaseAddress(pixelBuffer));
int image_height;
unsigned char *sourceStartAddr;
if (fullHeight <= image_width) {
image_height = fullHeight;
sourceStartAddr = sourceBaseAddr;
} else {
image_height = image_width;
const int marginY = ((fullHeight - image_width) / 2);
sourceStartAddr = (sourceBaseAddr + (marginY * sourceRowBytes));
}
const int image_channels = 4;
assert(image_channels >= wanted_input_channels);
tensorflow::Tensor image_tensor(
tensorflow::DT_FLOAT,
tensorflow::TensorShape(
{1, wanted_input_height, wanted_input_width, wanted_input_channels}));
auto image_tensor_mapped = image_tensor.tensor<float, 4>();
tensorflow::uint8 *in = sourceStartAddr;
float *out = image_tensor_mapped.data();
for (int y = 0; y < wanted_input_height; ++y) {
float *out_row = out + (y * wanted_input_width * wanted_input_channels);
for (int x = 0; x < wanted_input_width; ++x) {
const int in_x = (y * image_width) / wanted_input_width;
const int in_y = (x * image_height) / wanted_input_height;
tensorflow::uint8 *in_pixel =
in + (in_y * image_width * image_channels) + (in_x * image_channels);
float *out_pixel = out_row + (x * wanted_input_channels);
for (int c = 0; c < wanted_input_channels; ++c) {
out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
}
}
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, unlockFlags);
我认为问题出在这里,
tensorflow::uint8 *in_pixel =
in + (in_y * image_width * image_channels) + (in_x * image_channels);
float *out_pixel = out_row + (x * wanted_input_channels);
我的理解是这只是通过选择每个第x个像素而不是将原始图像缩放到299大小来缩放到299大小。因此,这会导致缩放不良和图像识别不良。
解决方案是首先将pixelBuffer缩放到299。 我试过这个,
UIImage *uiImage = [self uiImageFromPixelBuffer: pixelBuffer];
float scaleFactor = (float)wanted_input_height / (float)fullHeight;
float newWidth = image_width * scaleFactor;
NSLog(@"width: %d, height: %d, scale: %f, height: %f", image_width, fullHeight, scaleFactor, newWidth);
CGSize size = CGSizeMake(wanted_input_width, wanted_input_height);
UIGraphicsBeginImageContext(size);
[uiImage drawInRect:CGRectMake(0, 0, newWidth, size.height)];
UIImage *destImage = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();
pixelBuffer = [self pixelBufferFromCGImage: destImage.CGImage];
并将图像转换为pixle缓冲区,
- (CVPixelBufferRef) pixelBufferFromCGImage: (CGImageRef) image
{
NSDictionary *options = @{
(NSString*)kCVPixelBufferCGImageCompatibilityKey : @YES,
(NSString*)kCVPixelBufferCGBitmapContextCompatibilityKey : @YES,
};
CVPixelBufferRef pxbuffer = NULL;
CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault, CGImageGetWidth(image),
CGImageGetHeight(image), kCVPixelFormatType_32ARGB, (__bridge CFDictionaryRef) options,
&pxbuffer);
if (status!=kCVReturnSuccess) {
NSLog(@"Operation failed");
}
NSParameterAssert(status == kCVReturnSuccess && pxbuffer != NULL);
CVPixelBufferLockBaseAddress(pxbuffer, 0);
void *pxdata = CVPixelBufferGetBaseAddress(pxbuffer);
CGColorSpaceRef rgbColorSpace = CGColorSpaceCreateDeviceRGB();
CGContextRef context = CGBitmapContextCreate(pxdata, CGImageGetWidth(image),
CGImageGetHeight(image), 8, 4*CGImageGetWidth(image), rgbColorSpace,
kCGImageAlphaNoneSkipFirst);
NSParameterAssert(context);
CGContextConcatCTM(context, CGAffineTransformMakeRotation(0));
CGAffineTransform flipVertical = CGAffineTransformMake( 1, 0, 0, -1, 0, CGImageGetHeight(image) );
CGContextConcatCTM(context, flipVertical);
CGAffineTransform flipHorizontal = CGAffineTransformMake( -1.0, 0.0, 0.0, 1.0, CGImageGetWidth(image), 0.0 );
CGContextConcatCTM(context, flipHorizontal);
CGContextDrawImage(context, CGRectMake(0, 0, CGImageGetWidth(image),
CGImageGetHeight(image)), image);
CGColorSpaceRelease(rgbColorSpace);
CGContextRelease(context);
CVPixelBufferUnlockBaseAddress(pxbuffer, 0);
return pxbuffer;
}
- (UIImage*) uiImageFromPixelBuffer: (CVPixelBufferRef) pixelBuffer {
CIImage *ciImage = [CIImage imageWithCVPixelBuffer: pixelBuffer];
CIContext *temporaryContext = [CIContext contextWithOptions:nil];
CGImageRef videoImage = [temporaryContext
createCGImage:ciImage
fromRect:CGRectMake(0, 0,
CVPixelBufferGetWidth(pixelBuffer),
CVPixelBufferGetHeight(pixelBuffer))];
UIImage *uiImage = [UIImage imageWithCGImage:videoImage];
CGImageRelease(videoImage);
return uiImage;
}
不确定这是否是调整大小的最佳方法,但这有效。 但它似乎使图像分类更糟糕,而不是更好......
图片转换/调整大小的任何想法或问题?
答案 0 :(得分:6)
由于您未使用YOLO Detector,MAINTAIN_ASPECT
标志设置为false
。因此,Android应用上的图像不会被裁剪,但它会缩放。但是,在提供的代码片段中,我没有看到标志的实际初始化。确认您的应用中该标志的值实际为false
。
我知道这不是一个完整的解决方案,但希望这可以帮助您调试问题。
答案 1 :(得分:2)
Tensorflow对象检测具有默认和标准配置,下面是设置列表,
您需要根据输入的ML模型检查的重要事项
-> model_file_name-根据您的.pb文件名,
-> model_uses_memory_mapping-由您决定减少总体内存使用量。
-> labels_file_name-这取决于我们的标签文件名
-> input_layer_name / output_layer_name-确保您使用在创建graph(.pb)文件时使用的自己的图层输入/输出名称。
代码段:
// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"graph";//@"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = true;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"labels";//@"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 224;
const int wanted_input_height = 224;
const int wanted_input_channels = 3;
const float input_mean = 117.0f;
const float input_std = 1.0f;
const std::string input_layer_name = "input";
const std::string output_layer_name = "final_result";
自定义图像Tensorflow检测,您可以使用以下工作片段:
->对于此过程,您只需传递UIImage.CGImage对象,
NSString* RunInferenceOnImageResult(CGImageRef image) {
tensorflow::SessionOptions options;
tensorflow::Session* session_pointer = nullptr;
tensorflow::Status session_status = tensorflow::NewSession(options, &session_pointer);
if (!session_status.ok()) {
std::string status_string = session_status.ToString();
return [NSString stringWithFormat: @"Session create failed - %s",
status_string.c_str()];
}
std::unique_ptr<tensorflow::Session> session(session_pointer);
LOG(INFO) << "Session created.";
tensorflow::GraphDef tensorflow_graph;
LOG(INFO) << "Graph created.";
NSString* network_path = FilePathForResourceNames(@"tensorflow_inception_graph", @"pb");
PortableReadFileToProtol([network_path UTF8String], &tensorflow_graph);
LOG(INFO) << "Creating session.";
tensorflow::Status s = session->Create(tensorflow_graph);
if (!s.ok()) {
LOG(ERROR) << "Could not create TensorFlow Graph: " << s;
return @"";
}
// Read the label list
NSString* labels_path = FilePathForResourceNames(@"imagenet_comp_graph_label_strings", @"txt");
std::vector<std::string> label_strings;
std::ifstream t;
t.open([labels_path UTF8String]);
std::string line;
while(t){
std::getline(t, line);
label_strings.push_back(line);
}
t.close();
// Read the Grace Hopper image.
//NSString* image_path = FilePathForResourceNames(@"grace_hopper", @"jpg");
int image_width;
int image_height;
int image_channels;
// std::vector<tensorflow::uint8> image_data = LoadImageFromFile(
// [image_path UTF8String], &image_width, &image_height, &image_channels);
std::vector<tensorflow::uint8> image_data = LoadImageFromImage(image,&image_width, &image_height, &image_channels);
const int wanted_width = 224;
const int wanted_height = 224;
const int wanted_channels = 3;
const float input_mean = 117.0f;
const float input_std = 1.0f;
assert(image_channels >= wanted_channels);
tensorflow::Tensor image_tensor(
tensorflow::DT_FLOAT,
tensorflow::TensorShape({
1, wanted_height, wanted_width, wanted_channels}));
auto image_tensor_mapped = image_tensor.tensor<float, 4>();
tensorflow::uint8* in = image_data.data();
// tensorflow::uint8* in_end = (in + (image_height * image_width * image_channels));
float* out = image_tensor_mapped.data();
for (int y = 0; y < wanted_height; ++y) {
const int in_y = (y * image_height) / wanted_height;
tensorflow::uint8* in_row = in + (in_y * image_width * image_channels);
float* out_row = out + (y * wanted_width * wanted_channels);
for (int x = 0; x < wanted_width; ++x) {
const int in_x = (x * image_width) / wanted_width;
tensorflow::uint8* in_pixel = in_row + (in_x * image_channels);
float* out_pixel = out_row + (x * wanted_channels);
for (int c = 0; c < wanted_channels; ++c) {
out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
}
}
}
NSString* result;
// result = [NSString stringWithFormat: @"%@ - %lu, %s - %dx%d", result,
// label_strings.size(), label_strings[0].c_str(), image_width, image_height];
std::string input_layer = "input";
std::string output_layer = "output";
std::vector<tensorflow::Tensor> outputs;
tensorflow::Status run_status = session->Run({{input_layer, image_tensor}},
{output_layer}, {}, &outputs);
if (!run_status.ok()) {
LOG(ERROR) << "Running model failed: " << run_status;
tensorflow::LogAllRegisteredKernels();
result = @"Error running model";
return result;
}
tensorflow::string status_string = run_status.ToString();
result = [NSString stringWithFormat: @"Status :%s\n",
status_string.c_str()];
tensorflow::Tensor* output = &outputs[0];
const int kNumResults = 5;
const float kThreshold = 0.1f;
std::vector<std::pair<float, int> > top_results;
GetTopN(output->flat<float>(), kNumResults, kThreshold, &top_results);
std::stringstream ss;
ss.precision(3);
for (const auto& result : top_results) {
const float confidence = result.first;
const int index = result.second;
ss << index << " " << confidence << " ";
// Write out the result as a string
if (index < label_strings.size()) {
// just for safety: theoretically, the output is under 1000 unless there
// is some numerical issues leading to a wrong prediction.
ss << label_strings[index];
} else {
ss << "Prediction: " << index;
}
ss << "\n";
}
LOG(INFO) << "Predictions: " << ss.str();
tensorflow::string predictions = ss.str();
result = [NSString stringWithFormat: @"%@ - %s", result,
predictions.c_str()];
return result;
}
缩放图像以自定义宽度和高度-C ++代码段,
std::vector<uint8> LoadImageFromImage(CGImageRef image,
int* out_width, int* out_height,
int* out_channels) {
const int width = (int)CGImageGetWidth(image);
const int height = (int)CGImageGetHeight(image);
const int channels = 4;
CGColorSpaceRef color_space = CGColorSpaceCreateDeviceRGB();
const int bytes_per_row = (width * channels);
const int bytes_in_image = (bytes_per_row * height);
std::vector<uint8> result(bytes_in_image);
const int bits_per_component = 8;
CGContextRef context = CGBitmapContextCreate(result.data(), width, height,
bits_per_component, bytes_per_row, color_space,
kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
CGColorSpaceRelease(color_space);
CGContextDrawImage(context, CGRectMake(0, 0, width, height), image);
CGContextRelease(context);
CFRelease(image);
*out_width = width;
*out_height = height;
*out_channels = channels;
return result;
}
以上功能可帮助您根据自定义比率加载图像数据。在张量流分类期间,宽度和高度的高精度图像像素比率为224 x 224。
您需要从RunInferenceOnImageResult调用上面的LoadImage函数,并带有实际的自定义宽度和高度参数以及图像参考。
答案 2 :(得分:1)
请更改此代码:
// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = false;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 299;
const int wanted_input_height = 299;
const int wanted_input_channels = 3;
const float input_mean = 128f;
const float input_std = 1.0f;
const std::string input_layer_name = "Mul";
const std::string output_layer_name = "final_result";
此处更改:const float input_std = 1.0f;