Question

我对tensorflow还是很陌生，目前正在使用和编辑tutorial for recognizing handwritten digits with CNNs。它可以工作，但是model.predict花费的时间比预期的要长得多。我这方面可能存在一些基本误会。

相关部分是我添加的代码，该代码可以预测所有65000个样本，将结果与标签进行比较，并输出出现错误的图像，而不仅仅是计数：

import * as tf from '@tensorflow/tfjs';
import * as tfvis from '@tensorflow/tfjs-vis';

export async function showMistakes([imageData, labelData]: [Float32Array, Uint8Array], model: tf.Sequential) {
  let start = performance.now();
  let [predictionsTensor, labelsTensor] = tf.tidy(() => {
    console.log(`(A) Time taken: ${performance.now() - start}ms`);
    // imageData = imageData.slice(0, 28 * 28);
    // labelData = labelData.slice(0, 10);
    let input = tf.tensor4d(imageData, [imageData.length / (28 * 28), 28, 28, 1]);
    console.log(`(B) Time taken: ${performance.now() - start}ms`);
    // let dummy = input.arraySync();
    // console.log(`(B2) Time taken: ${performance.now() - start}ms`);
    // let dummy2 = input.arraySync();
    // console.log(`(B3) Time taken: ${performance.now() - start}ms`);
    let predictionsFullTensor = (model.predict(input) as tf.Tensor2D);
    console.log(`(C) Time taken: ${performance.now() - start}ms`);
    let predictionsTensor = predictionsFullTensor.argMax<tf.Tensor1D>(-1);
    console.log(`(D) Time taken: ${performance.now() - start}ms`);
    let labelsTensor = tf.tensor2d(labelData, [labelData.length / 10, 10]).argMax<tf.Tensor1D>(-1);
    console.log(`(E) Time taken: ${performance.now() - start}ms`);
    console.log(tf.memory());
    return [predictionsTensor, labelsTensor];
  });

  console.log(`(F) Time taken: ${performance.now() - start}ms`);
  console.log(tf.memory());
  let [predictions, labels] = [await predictionsTensor.array(), await labelsTensor.array()];
  console.log(`(G) Time taken: ${performance.now() - start}ms`);
  predictionsTensor.dispose();
  labelsTensor.dispose();

  console.log(tf.memory());

  let tempCanvas = document.createElement("canvas");
  tempCanvas.width = 28;
  tempCanvas.height = 28;

  const MAX_FAILS = 384;
  let fails = predictions
    .map((prediction, i) => [prediction, labels[i], i] as const)
    .filter(([prediction], i) => prediction !== labels[i])
    .slice(0, MAX_FAILS)
    .map(([prediction, label, i]) => {
      let canvas = document.createElement('canvas');

      const SCALE = 2;
      const IMAGE_SIZE = 28;
      canvas.width = IMAGE_SIZE * SCALE;
      canvas.height = IMAGE_SIZE * SCALE;

      tempCanvas.getContext("2d")?.putImageData(
        new ImageData(
          Uint8ClampedArray.from(
            { length: 28 * 28 * 4 },
            (_, j) => j % 4 === 3 ? 255 : Math.round(imageData[i * 28 * 28 + Math.floor(j / 4)] * 255)
          ),
          28
        ),
        0,
        0
      );

      canvas.getContext("2d")?.drawImage(tempCanvas, 0, 0, IMAGE_SIZE * SCALE, IMAGE_SIZE * SCALE);
      return [prediction, label, i, canvas] as const;
    });

  const surface = tfvis.visor().surface({ name: 'False predictions', tab: 'Mistakes'});

  let previousContainer = surface.drawArea.querySelector("#falsePredictionsContainer");
  if (previousContainer !== null) surface.drawArea.removeChild(previousContainer);

  let container = document.createElement("div");
  container.id = "falsePredictionsContainer";
  for (let [prediction, label, i, canvas] of fails) {
    let node = document.createElement("div");
    node.className = "falsePrediction";
    node.textContent = `#${i}: predicted ${prediction}, is labeled as ${label}`;
    node.appendChild(canvas);
    container.appendChild(node);
  }
  surface.drawArea.appendChild(container);
}

调用由showMistakes([data.datasetImages, data.datasetLabels], model);完成，其中data是本教程中的MNistData。

它可以工作，但是model.predict花费了不切实际的时间（在最简单的模型上，所有65000个样本大约需要16秒，仅两个16neuron致密层，没有转换）。虽然我可能无法做出有根据的猜测，但在我看来，主要的时间因素是cpu / gpu之间的某些数据转换和交换。

对于前面提到的非常简单的模型，它根本不需要时间，我可以看到model.predict所花费的时间几乎与对张量的.arraySync()调用所花费的时间一样长（也许只是通过机会，因为我无法执行太多测试，但是无论如何我都会提到这一点。我添加了无用的（请参见代码中的注释）.arraySync()仅用于测试，因为我怀疑我的数据转换有问题，而且我会观察直到输入张量准备就绪的时间，而不是实际的时间预测需要。

请注意，我要一次预测所有数据，输入张量的形状为[65000，28，28，1]。对于内存使用，我可以观察到以下内容：在gpu上，使用tf.memory（），我可以看到numBytesInGPU: 209703140，它看起来非常逼真，它有65000 * 28 * 28 * 4个字节以及较小的开销（跳回到numBytesInGPU: 114896之后，看起来还可以）。

该代码是由webpack捆绑的，因此是tfjs npm模块的导入。我正在PC上运行此程序，这些天这些天宁可称为烤面包机，但tf.backend（）仍显示“ webgl”，因此它不仅限于cpu或其他内容

PS：对于精度远远高于99％的CNN，“错误”通常是非常有趣的-tbh，很多时候是贴错标签的，或者仅仅是MNIST数据中的噪音。以下显然是三个和五个！

预测比预期慢得多

0 个答案: