TensorFlow:缩放图像以预测其标签时出现问题

时间:2018-11-18 05:26:26

标签: java python tensorflow keras tensorflow-datasets

我正在尝试在Java应用程序中使用图像分类器Tensorflow模型来预测与时尚相关的图像,例如T恤,外套,运动鞋等。

Tensorflow提供了一个示例,说明如何创建我们的第一个神经网络,并使用Keras在Fashion MNIST数据集(28 x 28像素)上对其进行训练。

https://www.tensorflow.org/tutorials/keras/basic_classification

我执行了上面链接中提到的所有步骤,并将经过训练的keras模型转换为.pb(协议缓冲区)文件。

Tensorflow还提供了LabelImage.java示例,以展示我们如何在Java应用程序中使用经过训练的模型并预测图像。在此示例中,他们使用了inception5h模型。

我不想使用inception5h模型,因为它不符合我们的要求。我想将机器学习应用于时尚形象。

这就是为什么我决定通过引用LabelImage.java将keras模型导出为.pb格式并在Java应用程序中使用的原因

**问题:**

将输入图像缩放到预期大小时遇到​​的问题。每当我执行我的Java程序时,都会出现以下错误...

2018-11-18 10:16:29.934982: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
Exception in thread "main" java.lang.IllegalArgumentException: Matrix size-incompatible: In[0]: [1,150528], In[1]: [784,128]
[[Node: dense_8/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](flatten_4/Reshape, dense_8/kernel)]]

这是完整的Java代码。

  package main.java.com.emaraic.ObjectRecognition;

  import java.io.IOException;
  import java.io.PrintStream;
  import java.nio.charset.Charset;
  import java.nio.file.Files;
  import java.nio.file.Path;
  import java.nio.file.Paths;
  import java.util.Arrays;
  import java.util.List;

  import org.tensorflow.DataType;
  import org.tensorflow.Graph;
  import org.tensorflow.Output;
  import org.tensorflow.Session;
  import org.tensorflow.Tensor;
  import org.tensorflow.TensorFlow;
  import org.tensorflow.types.UInt8;

  import main.java.com.emaraic.ObjectRecognition.LabelImage.GraphBuilder;

  /** Sample use of the TensorFlow Java API to label images using a pre-trained model. */
  public class FashionImage {
    private static void printUsage(PrintStream s) {
      final String URL =
          "https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip";
      s.println(
          "Java program that uses a pre-trained Inception model (http://arxiv.org/abs/1512.00567)");
      s.println("to label JPEG images.");
      s.println("TensorFlow version: " + TensorFlow.version());
      s.println();
      s.println("Usage: label_image <model dir> <image file>");
      s.println();
      s.println("Where:");
      s.println("<model dir> is a directory containing the unzipped contents of the inception model");
      s.println("            (from " + url + ")");
      s.println("<image file> is the path to a JPEG image file");
    }

    public static void main(String[] args) {
      String modelDir = "/TensorFlow/FashionModel";
      String imageFile = "/TensorFlow/shirt.jpg";

      byte[] graphDef = readAllBytesOrExit(Paths.get(modelDir, "my_model_new.pb"));
      List<String> labels =
          readAllLinesOrExit(Paths.get(modelDir, "label_strings.txt"));
      byte[] imageBytes = readAllBytesOrExit(Paths.get(imageFile));

      try (Tensor<Float> image = constructAndExecuteGraphToNormalizeImage(imageBytes)) {
        float[] labelProbabilities = executeInceptionGraph(graphDef, image);
        int bestLabelIdx = maxIndex(labelProbabilities);
        System.out.println(
            String.format("BEST MATCH: %s (%.2f%% likely)",
                labels.get(bestLabelIdx),
                labelProbabilities[bestLabelIdx] * 100f));
      }
    }

    private static Tensor<Float> constructAndExecuteGraphToNormalizeImage(byte[] imageBytes) {
      try (Graph g = new Graph()) {
          GraphBuilder b = new GraphBuilder(g);
          // Some constants specific to the pre-trained model at:
          // https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip
          //
          // - The model was trained with images scaled to 224x224 pixels.
          // - The colors, represented as R, G, B in 1-byte each were converted to
          //   float using (value - Mean)/Scale.
          final int H = 28;
          final int W = 28;
          //final float mean = 0f;
          final float scale = 1f;

          // Since the graph is being constructed once per execution here, we can use a constant for the
          // input image. If the graph were to be re-used for multiple input images, a placeholder would
          // have been more appropriate.
          final Output<String> input = b.constant("input", imageBytes);
          final Output<Float> output =
              b.div(
              //    b.sub(
                      b.resizeBilinear(
                          b.expandDims(
                              b.cast(b.decodeJpeg(input, 3), Float.class),
                              b.constant("make_batch", 0)),
                          b.constant("size", new int[] {H, W})),
                  //    b.constant("mean", mean)),
                  b.constant("scale", scale));
          try (Session s = new Session(g)) {
            // Generally, there may be multiple output tensors, all of them must be closed to prevent resource leaks.
            return s.runner().fetch(output.op().name()).run().get(0).expect(Float.class);
          }
        }
    }

    private static float[] executeInceptionGraph(byte[] graphDef, Tensor<Float> image) {
      try (Graph g = new Graph()) {
        g.importGraphDef(graphDef);
        try (Session s = new Session(g);
            // Generally, there may be multiple output tensors, all of them must be closed to prevent resource leaks.
            Tensor<Float> result =
                s.runner().feed("flatten_4_input", image).fetch("dense_9/Softmax").run().get(0).expect(Float.class)) {
          final long[] rshape = result.shape();
          if (result.numDimensions() != 2 || rshape[0] != 1) {
            throw new RuntimeException(
                String.format(
                    "Expected model to produce a [1 N] shaped tensor where N is the number of labels, instead it produced one with shape %s",
                    Arrays.toString(rshape)));
          }
          int nlabels = (int) rshape[1];
          return result.copyTo(new float[1][nlabels])[0];
        }
      }
    }

    private static int maxIndex(float[] probabilities) {
      int best = 0;
      for (int i = 1; i < probabilities.length; ++i) {
        if (probabilities[i] > probabilities[best]) {
          best = i;
        }
      }
      return best;
    }

    private static byte[] readAllBytesOrExit(Path path) {
      try {
        return Files.readAllBytes(path);
      } catch (IOException e) {
        System.err.println("Failed to read [" + path + "]: " + e.getMessage());
        System.exit(1);
      }
      return null;
    }

    private static List<String> readAllLinesOrExit(Path path) {
      try {
        return Files.readAllLines(path, Charset.forName("UTF-8"));
      } catch (IOException e) {
        System.err.println("Failed to read [" + path + "]: " + e.getMessage());
        System.exit(0);
      }
      return null;
    }

    // In the fullness of time, equivalents of the methods of this class should be auto-generated from
    // the OpDefs linked into libtensorflow_jni.so. That would match what is done in other languages
    // like Python, C++ and Go.
    static class GraphBuilder {
      GraphBuilder(Graph g) {
        this.g = g;
      }

      Output<Float> div(Output<Float> x, Output<Float> y) {
        return binaryOp("Div", x, y);
      }

      <T> Output<T> sub(Output<T> x, Output<T> y) {
        return binaryOp("Sub", x, y);
      }

      <T> Output<Float> resizeBilinear(Output<T> images, Output<Integer> size) {
        return binaryOp3("ResizeBilinear", images, size);
      }

      <T> Output<T> expandDims(Output<T> input, Output<Integer> dim) {
        return binaryOp3("ExpandDims", input, dim);
      }

      <T, U> Output<U> cast(Output<T> value, Class<U> type) {
        DataType dtype = DataType.fromClass(type);
        return g.opBuilder("Cast", "Cast")
            .addInput(value)
            .setAttr("DstT", dtype)
            .build()
            .<U>output(0);
      }

      Output<UInt8> decodeJpeg(Output<String> contents, long channels) {
        return g.opBuilder("DecodeJpeg", "DecodeJpeg")
            .addInput(contents)
            .setAttr("channels", channels)
            .build()
            .<UInt8>output(0);
      }

      <T> Output<T> constant(String name, Object value, Class<T> type) {
        try (Tensor<T> t = Tensor.<T>create(value, type)) {
          return g.opBuilder("Const", name)
              .setAttr("dtype", DataType.fromClass(type))
              .setAttr("value", t)
              .build()
              .<T>output(0);
        }
      }
      Output<String> constant(String name, byte[] value) {
        return this.constant(name, value, String.class);
      }

      Output<Integer> constant(String name, int value) {
        return this.constant(name, value, Integer.class);
      }

      Output<Integer> constant(String name, int[] value) {
        return this.constant(name, value, Integer.class);
      }

      Output<Float> constant(String name, float value) {
        return this.constant(name, value, Float.class);
      }

      private <T> Output<T> binaryOp(String type, Output<T> in1, Output<T> in2) {
        return g.opBuilder(type, type).addInput(in1).addInput(in2).build().<T>output(0);
      }

      private <T, U, V> Output<T> binaryOp3(String type, Output<U> in1, Output<V> in2) {
        return g.opBuilder(type, type).addInput(in1).addInput(in2).build().<T>output(0);
      }
      private Graph g;
    }
  }

导出的模型-my_model_new.pb

标签-label_strings.txt

问题出在 constructAndExecuteGraphToNormalizeImage 方法上,因为我无法理解如何缩放图像。我认为预期大小应为(1、28、28)


UPDATE:

我将通道从3更改为1,然后代码可以正常工作,但是现在的问题是模型始终以100%置信度预测相同的值。

来自

b.cast(b.decodeJpeg(input, 3)

b.cast(b.decodeJpeg(input, 1)

有什么想法吗?


UPDATE:

我什至在我的jupyter笔记本中检查了训练有素的模型是否以新图像给出了正确的结果。结果与我在Java应用程序中得到的结果相同。模型始终预测所有图像的包。

img = image.load_img(path="t-shirt.jpg",grayscale=True,target_size=(28,28))
img = image.img_to_array(img)
test_img = img.reshape((1,28,28))

predictions_single = model.predict(test_img)
print(predictions_single)

结果:

[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]  //Bag with 100% probability

0 个答案:

没有答案