我正在尝试在Java应用程序中使用图像分类器Tensorflow模型来预测与时尚相关的图像,例如T恤,外套,运动鞋等。
Tensorflow提供了一个示例,说明如何创建我们的第一个神经网络,并使用Keras在Fashion MNIST数据集(28 x 28像素)上对其进行训练。
https://www.tensorflow.org/tutorials/keras/basic_classification
我执行了上面链接中提到的所有步骤,并将经过训练的keras模型转换为.pb(协议缓冲区)文件。
Tensorflow还提供了LabelImage.java示例,以展示我们如何在Java应用程序中使用经过训练的模型并预测图像。在此示例中,他们使用了inception5h模型。
我不想使用inception5h模型,因为它不符合我们的要求。我想将机器学习应用于时尚形象。
这就是为什么我决定通过引用LabelImage.java将keras模型导出为.pb格式并在Java应用程序中使用的原因
**问题:**
将输入图像缩放到预期大小时遇到的问题。每当我执行我的Java程序时,都会出现以下错误...
2018-11-18 10:16:29.934982: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
Exception in thread "main" java.lang.IllegalArgumentException: Matrix size-incompatible: In[0]: [1,150528], In[1]: [784,128]
[[Node: dense_8/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](flatten_4/Reshape, dense_8/kernel)]]
这是完整的Java代码。
package main.java.com.emaraic.ObjectRecognition;
import java.io.IOException;
import java.io.PrintStream;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;
import org.tensorflow.DataType;
import org.tensorflow.Graph;
import org.tensorflow.Output;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
import org.tensorflow.TensorFlow;
import org.tensorflow.types.UInt8;
import main.java.com.emaraic.ObjectRecognition.LabelImage.GraphBuilder;
/** Sample use of the TensorFlow Java API to label images using a pre-trained model. */
public class FashionImage {
private static void printUsage(PrintStream s) {
final String URL =
"https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip";
s.println(
"Java program that uses a pre-trained Inception model (http://arxiv.org/abs/1512.00567)");
s.println("to label JPEG images.");
s.println("TensorFlow version: " + TensorFlow.version());
s.println();
s.println("Usage: label_image <model dir> <image file>");
s.println();
s.println("Where:");
s.println("<model dir> is a directory containing the unzipped contents of the inception model");
s.println(" (from " + url + ")");
s.println("<image file> is the path to a JPEG image file");
}
public static void main(String[] args) {
String modelDir = "/TensorFlow/FashionModel";
String imageFile = "/TensorFlow/shirt.jpg";
byte[] graphDef = readAllBytesOrExit(Paths.get(modelDir, "my_model_new.pb"));
List<String> labels =
readAllLinesOrExit(Paths.get(modelDir, "label_strings.txt"));
byte[] imageBytes = readAllBytesOrExit(Paths.get(imageFile));
try (Tensor<Float> image = constructAndExecuteGraphToNormalizeImage(imageBytes)) {
float[] labelProbabilities = executeInceptionGraph(graphDef, image);
int bestLabelIdx = maxIndex(labelProbabilities);
System.out.println(
String.format("BEST MATCH: %s (%.2f%% likely)",
labels.get(bestLabelIdx),
labelProbabilities[bestLabelIdx] * 100f));
}
}
private static Tensor<Float> constructAndExecuteGraphToNormalizeImage(byte[] imageBytes) {
try (Graph g = new Graph()) {
GraphBuilder b = new GraphBuilder(g);
// Some constants specific to the pre-trained model at:
// https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip
//
// - The model was trained with images scaled to 224x224 pixels.
// - The colors, represented as R, G, B in 1-byte each were converted to
// float using (value - Mean)/Scale.
final int H = 28;
final int W = 28;
//final float mean = 0f;
final float scale = 1f;
// Since the graph is being constructed once per execution here, we can use a constant for the
// input image. If the graph were to be re-used for multiple input images, a placeholder would
// have been more appropriate.
final Output<String> input = b.constant("input", imageBytes);
final Output<Float> output =
b.div(
// b.sub(
b.resizeBilinear(
b.expandDims(
b.cast(b.decodeJpeg(input, 3), Float.class),
b.constant("make_batch", 0)),
b.constant("size", new int[] {H, W})),
// b.constant("mean", mean)),
b.constant("scale", scale));
try (Session s = new Session(g)) {
// Generally, there may be multiple output tensors, all of them must be closed to prevent resource leaks.
return s.runner().fetch(output.op().name()).run().get(0).expect(Float.class);
}
}
}
private static float[] executeInceptionGraph(byte[] graphDef, Tensor<Float> image) {
try (Graph g = new Graph()) {
g.importGraphDef(graphDef);
try (Session s = new Session(g);
// Generally, there may be multiple output tensors, all of them must be closed to prevent resource leaks.
Tensor<Float> result =
s.runner().feed("flatten_4_input", image).fetch("dense_9/Softmax").run().get(0).expect(Float.class)) {
final long[] rshape = result.shape();
if (result.numDimensions() != 2 || rshape[0] != 1) {
throw new RuntimeException(
String.format(
"Expected model to produce a [1 N] shaped tensor where N is the number of labels, instead it produced one with shape %s",
Arrays.toString(rshape)));
}
int nlabels = (int) rshape[1];
return result.copyTo(new float[1][nlabels])[0];
}
}
}
private static int maxIndex(float[] probabilities) {
int best = 0;
for (int i = 1; i < probabilities.length; ++i) {
if (probabilities[i] > probabilities[best]) {
best = i;
}
}
return best;
}
private static byte[] readAllBytesOrExit(Path path) {
try {
return Files.readAllBytes(path);
} catch (IOException e) {
System.err.println("Failed to read [" + path + "]: " + e.getMessage());
System.exit(1);
}
return null;
}
private static List<String> readAllLinesOrExit(Path path) {
try {
return Files.readAllLines(path, Charset.forName("UTF-8"));
} catch (IOException e) {
System.err.println("Failed to read [" + path + "]: " + e.getMessage());
System.exit(0);
}
return null;
}
// In the fullness of time, equivalents of the methods of this class should be auto-generated from
// the OpDefs linked into libtensorflow_jni.so. That would match what is done in other languages
// like Python, C++ and Go.
static class GraphBuilder {
GraphBuilder(Graph g) {
this.g = g;
}
Output<Float> div(Output<Float> x, Output<Float> y) {
return binaryOp("Div", x, y);
}
<T> Output<T> sub(Output<T> x, Output<T> y) {
return binaryOp("Sub", x, y);
}
<T> Output<Float> resizeBilinear(Output<T> images, Output<Integer> size) {
return binaryOp3("ResizeBilinear", images, size);
}
<T> Output<T> expandDims(Output<T> input, Output<Integer> dim) {
return binaryOp3("ExpandDims", input, dim);
}
<T, U> Output<U> cast(Output<T> value, Class<U> type) {
DataType dtype = DataType.fromClass(type);
return g.opBuilder("Cast", "Cast")
.addInput(value)
.setAttr("DstT", dtype)
.build()
.<U>output(0);
}
Output<UInt8> decodeJpeg(Output<String> contents, long channels) {
return g.opBuilder("DecodeJpeg", "DecodeJpeg")
.addInput(contents)
.setAttr("channels", channels)
.build()
.<UInt8>output(0);
}
<T> Output<T> constant(String name, Object value, Class<T> type) {
try (Tensor<T> t = Tensor.<T>create(value, type)) {
return g.opBuilder("Const", name)
.setAttr("dtype", DataType.fromClass(type))
.setAttr("value", t)
.build()
.<T>output(0);
}
}
Output<String> constant(String name, byte[] value) {
return this.constant(name, value, String.class);
}
Output<Integer> constant(String name, int value) {
return this.constant(name, value, Integer.class);
}
Output<Integer> constant(String name, int[] value) {
return this.constant(name, value, Integer.class);
}
Output<Float> constant(String name, float value) {
return this.constant(name, value, Float.class);
}
private <T> Output<T> binaryOp(String type, Output<T> in1, Output<T> in2) {
return g.opBuilder(type, type).addInput(in1).addInput(in2).build().<T>output(0);
}
private <T, U, V> Output<T> binaryOp3(String type, Output<U> in1, Output<V> in2) {
return g.opBuilder(type, type).addInput(in1).addInput(in2).build().<T>output(0);
}
private Graph g;
}
}
导出的模型-my_model_new.pb
标签-label_strings.txt
问题出在 constructAndExecuteGraphToNormalizeImage 方法上,因为我无法理解如何缩放图像。我认为预期大小应为(1、28、28)
我将通道从3更改为1,然后代码可以正常工作,但是现在的问题是模型始终以100%置信度预测相同的值。
来自
b.cast(b.decodeJpeg(input, 3)
到
b.cast(b.decodeJpeg(input, 1)
有什么想法吗?
我什至在我的jupyter笔记本中检查了训练有素的模型是否以新图像给出了正确的结果。结果与我在Java应用程序中得到的结果相同。模型始终预测所有图像的包。
img = image.load_img(path="t-shirt.jpg",grayscale=True,target_size=(28,28))
img = image.img_to_array(img)
test_img = img.reshape((1,28,28))
predictions_single = model.predict(test_img)
print(predictions_single)
结果:
[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]] //Bag with 100% probability