Question

我使用Caffe和NVIDIA的DIGITS训练了一个模型。在DIGITS上对以下图像进行测试会产生以下结果：

当我从DIGITS下载模型时，我得到snapshot_iter_24240.caffemodel以及deploy.prototxt，mean.binaryproto和labels.txt。（以及我认为不相关的solver.prototxt和train_val.prototxt）

我使用coremltools将caffemodel转换为运行以下内容的mlmodel： import coremltools

# Convert a caffe model to a classifier in Core ML
coreml_model = coremltools.converters.caffe.convert(('snapshot_iter_24240.caffemodel',
                                                     'deploy.prototxt',
                                                     'mean.binaryproto'),
                                                      image_input_names = 'data',
                                                      class_labels = 'labels.txt')

# Now save the model

coreml_model.save('food.mlmodel')

代码输出以下内容：

(/anaconda/envs/coreml) bash-3.2$ python run.py 

================= Starting Conversion from Caffe to CoreML ======================
Layer 0: Type: 'Input', Name: 'input'. Output(s): 'data'.
Ignoring batch size and retaining only the trailing 3 dimensions for conversion. 
Layer 1: Type: 'Convolution', Name: 'conv1'. Input(s): 'data'. Output(s): 'conv1'.
Layer 2: Type: 'ReLU', Name: 'relu1'. Input(s): 'conv1'. Output(s): 'conv1'.
Layer 3: Type: 'LRN', Name: 'norm1'. Input(s): 'conv1'. Output(s): 'norm1'.
Layer 4: Type: 'Pooling', Name: 'pool1'. Input(s): 'norm1'. Output(s): 'pool1'.
Layer 5: Type: 'Convolution', Name: 'conv2'. Input(s): 'pool1'. Output(s): 'conv2'.
Layer 6: Type: 'ReLU', Name: 'relu2'. Input(s): 'conv2'. Output(s): 'conv2'.
Layer 7: Type: 'LRN', Name: 'norm2'. Input(s): 'conv2'. Output(s): 'norm2'.
Layer 8: Type: 'Pooling', Name: 'pool2'. Input(s): 'norm2'. Output(s): 'pool2'.
Layer 9: Type: 'Convolution', Name: 'conv3'. Input(s): 'pool2'. Output(s): 'conv3'.
Layer 10: Type: 'ReLU', Name: 'relu3'. Input(s): 'conv3'. Output(s): 'conv3'.
Layer 11: Type: 'Convolution', Name: 'conv4'. Input(s): 'conv3'. Output(s): 'conv4'.
Layer 12: Type: 'ReLU', Name: 'relu4'. Input(s): 'conv4'. Output(s): 'conv4'.
Layer 13: Type: 'Convolution', Name: 'conv5'. Input(s): 'conv4'. Output(s): 'conv5'.
Layer 14: Type: 'ReLU', Name: 'relu5'. Input(s): 'conv5'. Output(s): 'conv5'.
Layer 15: Type: 'Pooling', Name: 'pool5'. Input(s): 'conv5'. Output(s): 'pool5'.
Layer 16: Type: 'InnerProduct', Name: 'fc6'. Input(s): 'pool5'. Output(s): 'fc6'.
Layer 17: Type: 'ReLU', Name: 'relu6'. Input(s): 'fc6'. Output(s): 'fc6'.
Layer 18: Type: 'Dropout', Name: 'drop6'. Input(s): 'fc6'. Output(s): 'fc6'.
WARNING: Skipping training related layer 'drop6' of type 'Dropout'.
Layer 19: Type: 'InnerProduct', Name: 'fc7'. Input(s): 'fc6'. Output(s): 'fc7'.
Layer 20: Type: 'ReLU', Name: 'relu7'. Input(s): 'fc7'. Output(s): 'fc7'.
Layer 21: Type: 'Dropout', Name: 'drop7'. Input(s): 'fc7'. Output(s): 'fc7'.
WARNING: Skipping training related layer 'drop7' of type 'Dropout'.
Layer 22: Type: 'InnerProduct', Name: 'fc8_food'. Input(s): 'fc7'. Output(s): 'fc8_food'.
Layer 23: Type: 'Softmax', Name: 'prob'. Input(s): 'fc8_food'. Output(s): 'prob'.

================= Summary of the conversion: ===================================
Detected input(s) and shape(s) (ignoring batch size):
'data' : 3, 227, 227
Size of mean image: (H,W) = (256, 256) is greater than input image size: (H,W) = (227, 227). Mean image will be center cropped to match the input image dimensions. 

Network Input name(s): 'data'.
Network Output name(s): 'prob'.
(/anaconda/envs/coreml) bash-3.2$

import UIKit
import CoreML
import Vision

class ViewController: UIViewController {

  override func viewDidLoad() {
    super.viewDidLoad()

    var images = [CIImage]()
//    guard let ciImage = CIImage(image: #imageLiteral(resourceName: "pizza")) else {
//      fatalError("couldn't convert UIImage to CIImage")
//    }
    images.append(CIImage(image: #imageLiteral(resourceName: "pizza"))!)
    images.append(CIImage(image: #imageLiteral(resourceName: "spaghetti"))!)
    images.append(CIImage(image: #imageLiteral(resourceName: "burger"))!)
    images.append(CIImage(image: #imageLiteral(resourceName: "sushi"))!)
    images.forEach{detectScene(image: $0)}

    // Do any additional setup after loading the view, typically from a nib.
  }

  override func didReceiveMemoryWarning() {
    super.didReceiveMemoryWarning()
    // Dispose of any resources that can be recreated.
  }

  func detectScene(image: CIImage) {
    guard let model = try? VNCoreMLModel(for: food().model) else {
      fatalError()
    }
    // Create a Vision request with completion handler
    let request = VNCoreMLRequest(model: model) { [weak self] request, error in
      guard let results = request.results as? [VNClassificationObservation],
        let topResult = results.first else {
          fatalError("unexpected result type from VNCoreMLRequest")
      }

      // Update UI on main queue
      //let article = (self?.vowels.contains(topResult.identifier.first!))! ? "an" : "a"
      DispatchQueue.main.async { [weak self] in
        results.forEach({ (result) in
          if Int(result.confidence * 100) > 1 {
            print("\(Int(result.confidence * 100))% it's \(result.identifier)")
          }
        })
        print("********************************")

      }
    }
    let handler = VNImageRequestHandler(ciImage: image)
    DispatchQueue.global(qos: .userInteractive).async {
      do {
        try handler.perform([request])
      } catch {
        print(error)
      }
    }
  }
}

输出以下内容：

22% it's cup cakes
8% it's ice cream
5% it's falafel
5% it's macarons
3% it's churros
3% it's gyoza
3% it's donuts
2% it's tacos
2% it's cannoli
********************************
35% it's cup cakes
22% it's frozen yogurt
8% it's chocolate cake
7% it's chocolate mousse
6% it's ice cream
2% it's donuts
********************************
38% it's gyoza
7% it's falafel
6% it's tacos
4% it's hamburger
3% it's oysters
2% it's peking duck
2% it's hot dog
2% it's baby back ribs
2% it's cannoli
********************************
7% it's hamburger
6% it's pork chop
6% it's steak
6% it's peking duck
5% it's pho
5% it's prime rib
5% it's baby back ribs
4% it's mussels
4% it's grilled salmon
2% it's filet mignon
2% it's foie gras
2% it's pulled pork sandwich
********************************

完全关闭，与模型在DIGITS上的表现不一致。我不确定我做错了什么，或者我错过了一步。我尝试在没有mean.binaryproto的情况下创建模型，但这没有任何区别。

如果有帮助的话，deploy.prototxt

input: "data"
input_shape {
  dim: 1
  dim: 3
  dim: 227
  dim: 227
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8_food"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8_food"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 101
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8_food"
  top: "prob"
}

Answer 1

使用CaffeModel和CoreML对DIGITS的预测之间的差异是由于CoreML以不同于DIGITS的方式解释输入数据。使用以下参数将呼叫更改为convert解决了问题

coreml_model = coremltools.converters.caffe.convert(('snapshot_iter_24240.caffemodel',
                                                     'deploy.prototxt',
                                                     'mean.binaryproto'),
                                                      image_input_names = 'data',
                                                      class_labels = 'labels.txt',
                                                      is_bgr=True, image_scale=255.)

http://pythonhosted.org/coremltools/generated/coremltools.converters.caffe.convert.html#coremltools.converters.caffe.convert

99% it's spaghetti bolognese
********************************
73% it's pizza
10% it's lasagna
7% it's spaghetti bolognese
2% it's spaghetti carbonara
********************************
97% it's sushi
********************************
97% it's hamburger
********************************

Answer 2

在当前形式中，coremltools倾向于更改输入/输出类型和值范围以适应其自身的内部优化。我强烈建议将新创建的.mlmodel文件重新导入Python代码并验证它所期望的数据类型。

例如：它会将Int值转换为Float（在Swift中使用Double类型），将Bool值转换为Int（True：1，False：0）

使用coremltools将caffe模型转换为CoreML会导致不一致的限制

2 个答案: