TensorFlow Lite Micro全连接层在新硬件目标上崩溃

时间:2019-06-27 12:41:50

标签: c++ tensorflow tensorflow-lite sparc

我目前正在向TensorFlow Lite Micro框架添加对新硬件目标(基于Sparc V8的LEON 3处理器)的支持。当我在此目标上构建并运行内置测试时,它们全部通过。但是,我无法在推理过程中崩溃而无法在新目标上执行任何包含的示例。

我制作了一个非常简单的玩具模型,它是一个20x10的全连接层,在TensorFlow Lite Micro中构建该模型可以很好地运行,但是当我为LEON 3构建并运行它时,它在崩溃期间崩溃并显示“数据访问异常”推理步骤。我设法将崩溃跟踪到调用以在完全连接的层上进行评估,这是TensorFlow Lite模型中的唯一运算符。我通过在MicroInterpreter :: Invoke()方法中添加调试打印来确定该方法在哪里崩溃了。

这是我的玩具示例的main.cc源代码。对于本地linux_x86_64目标,此代码可以构建并执行得很好。

#include <stdio.h>
#include "tensorflow/lite/experimental/micro/examples/toy_model/model/tiny_model_data.h"
#include "tensorflow/lite/experimental/micro/kernels/all_ops_resolver.h"
#include "tensorflow/lite/experimental/micro/micro_error_reporter.h"
#include "tensorflow/lite/experimental/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"


int main(int argc, char* argv[]) {
  // Set up logging.
  tflite::MicroErrorReporter micro_error_reporter;
  tflite::ErrorReporter* error_reporter = &micro_error_reporter;

  printf("Parsing model FlatBuffer.\n");

  // Map the model into a usable data structure. This doesn't involve any
  // copying or parsing, it's a very lightweight operation.
  const tflite::Model* model =
      ::tflite::GetModel(tiny_tflite);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    error_reporter->Report(
        "Model provided is schema version %d not equal "
        "to supported version %d.\n",
        model->version(), TFLITE_SCHEMA_VERSION);
    return 1;
  }

  printf("Model parsed.\n");

  // This pulls in all the operation implementations we need.
  printf("Pull in operation implementations.");
  tflite::ops::micro::AllOpsResolver resolver;
  printf("Done.\n");

  // Create an area of memory to use for input, output, and intermediate arrays.
  // The size of this will depend on the model you're using, and may need to be
  // determined by experimentation.
  printf("Allocate memory buffer.\n");
  const int tensor_arena_size = 200 * 1024;
  uint8_t tensor_arena[tensor_arena_size];
  tflite::SimpleTensorAllocator tensor_allocator(tensor_arena,
                                                 tensor_arena_size);
  printf("Done.\n");

  // Build an interpreter to run the model with.
  printf("Build interpreter.\n");
  tflite::MicroInterpreter interpreter(model, resolver, &tensor_allocator,
                                       error_reporter);
  printf("Done.\n");

  printf("Setting input data.\n");
  TfLiteTensor* model_input = interpreter.input(0);
  for (int d=0; d<20; ++d)
    model_input->data.f[d] = d / 20.0;
  printf("Done.\n");

  // perform inference
  printf("Perform inference.\n");
  TfLiteStatus invoke_status = interpreter.Invoke();
    if (invoke_status != kTfLiteOk) {
      printf("Invoke failed.\n");
      return 1;
    }
  printf("Done.\n");

  TfLiteTensor* model_output = interpreter.output(0);

  printf("Output tensor values:\n");
  for (int d=0; d<10; ++d)
    printf("[%d] %f\n", d, model_output->data.f[d]);

  return 0;
}

以下是成功执行本机版本时的输出:

Parsing model FlatBuffer.
Model parsed.
Pull in operation implementations.Done.
Allocate memory buffer.
Done.
Build interpreter.
Done.
Details of input tensors 0 :
Rank 2, type [Float32], shape [1, 20]
Setting input data.
Done.
Perform inference.
Entered Invoke()
init was okay.
get opcodes.
Starting operator [0]
Starting operator [0] 1
Starting operator [0] 2
Starting operator [0] 3
Starting operator [0] 4
Starting operator [0] 5
Starting operator [0] 6
Starting operator [0] 7
Starting operator [0] 8
Starting operator [0] 9
Starting operator [0] 10
Starting operator [0] 11
Starting operator [0] 12
Node FULLY_CONNECTED (number 0)
Starting operator [0] 13
Starting operator [0] 14
Done.
Details of output tensors 0 :
Rank 2, type [Float32], shape [1, 10]
Output tensor values:
[0] -0.085346
[1] -0.071581
[2] 0.195880
[3] -0.198830
[4] -0.255614
[5] -0.350692
[6] 0.053310
[7] -0.011272
[8] -0.107219
[9] 0.037424

执行失败的LEON构建时的输出。

tsim> run
starting at 0x40000000
Parsing model FlatBuffer.
Model parsed.
Pull in operation implementations.Done.
Allocate memory buffer.
Done.
Build interpreter.
Done.
Details of input tensors 0 :
Rank 2, type [Float32], shape [1, 20]
Setting input data.
Done.
Perform inference.
Entered Invoke()
init was okay.
get opcodes.
Starting operator [0]
Starting operator [0] 1
Starting operator [0] 2
Starting operator [0] 3
Starting operator [0] 4
Starting operator [0] 5
Starting operator [0] 6
Starting operator [0] 7
Starting operator [0] 8
Starting operator [0] 9
Starting operator [0] 10
Starting operator [0] 11
Starting operator [0] 12
Node FULLY_CONNECTED (number 0)

IU in error mode (tt=0x80, trap instruction)
(In trap table for tt=0x09, data access exception)
   162855  40000090  91d02000   ta  0x0

有趣的是,当我通过带有详细(-v)输出的valgrind运行本机代码时,在LEON3版本崩溃的确切点,我在下面收到了两个REDIR警告。

Starting operator [0]
Starting operator [0] 1
Starting operator [0] 2
Starting operator [0] 3
Starting operator [0] 4
Starting operator [0] 5
Starting operator [0] 6
Starting operator [0] 7
Starting operator [0] 8
Starting operator [0] 9
Starting operator [0] 10
Starting operator [0] 11
Starting operator [0] 12
Node FULLY_CONNECTED (number 0)
--4540-- REDIR: 0x55593f0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x4a286f0 (_vgnU_ifunc_wrapper)
--4540-- REDIR: 0x5612ea0 (libc.so.6:__memcpy_avx_unaligned) redirected to 0x4c324a0 (memcpy@@GLIBC_2.14)
Starting operator [0] 13
Starting operator [0] 14
Done.

如果任何TensorFlow Lite Micro团队或用户不知道是什么原因导致的,或者目标的LIBC实施中可能存在任何缺陷,那么我真的很感谢任何想法。

谢谢。

0 个答案:

没有答案