Colab TPU错误-InvalidArgumentError:TPU不支持的数据类型:double,由输出cond_8 / Merge:0引起

时间:2019-07-26 16:44:28

标签: machine-learning google-colaboratory google-cloud-tpu tpu

我正在尝试使用TPU在google colab上进行一些基本的字符分类。我收到以下错误:

InvalidArgumentError: Unsupported data type for TPU: double, caused by output cond_8/Merge:0

我不知道问题出在哪里,因为我在创建numpy数组时使用float32。我也不知道cond_8 / Merge:0是指什么。我加载的输入文件是一个JSON数组,代表了很多28x28灰度图像

[{"label":25,"data":[[[1],[.56720000]...],...]}]

我尝试注释掉除第一个输入层之外的所有层,问题仍然存在!!我的代码是:

import os, re, math, json, shutil, pprint
import PIL.Image, PIL.ImageFont, PIL.ImageDraw
import numpy as np
import json
import tensorflow as tf
from matplotlib import pyplot as plt
from tensorflow.python.platform import tf_logging
from google.colab import drive
print("Tensorflow version " + tf.__version__)

with open('/tmp/encoded.json') as json_file:
    data = json.load(json_file)

print("Got data")

images_data = list(map(lambda row: row["data"],data))
label_data = list(map(lambda row: row["label"],data))

print("mapped data")

images_data_tensor = np.asarray(images_data, dtype=np.float32)
label_data_tensor = np.asarray(label_data, dtype=np.float32)

print("converted to tensors")

BATCH_SIZE = 128


N = 24


# This model trains to 99.4% sometimes 99.5% accuracy in 10 epochs (with a batch size of 32)
def create_model():
  l = tf.keras.layers
  model = tf.keras.Sequential(
    [
      #l.Reshape(input_shape=(28*28,), target_shape=(28, 28, 1)),

      l.Conv2D(input_shape=(28,28,1,), filters=6, kernel_size=3, padding='same', use_bias=False), # no bias necessary before batch norm
      l.BatchNormalization(scale=False, center=True), # no batch norm scaling necessary before "relu"
      l.Activation('relu'), # activation after batch norm

      l.Conv2D(filters=12, kernel_size=6, padding='same', use_bias=False, strides=2),
      l.BatchNormalization(scale=False, center=True),
      l.Activation('relu'),

      l.Conv2D(filters=24, kernel_size=6, padding='same', use_bias=False, strides=2),
      l.BatchNormalization(scale=False, center=True),
      l.Activation('relu'),

      l.Flatten(),
      l.Dense(200, use_bias=False),
      l.BatchNormalization(scale=False, center=True),
      l.Activation('relu'),
      l.Dropout(0.5), # Dropout on dense layer only

      l.Dense(10, activation='softmax')
    ])
  return model

# set up learning rate decay
lr_decay = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 0.0001 + 0.02 * math.pow(0.5, 1+epoch), verbose=True)


EPOCHS = 10
tpu = None

# Default strategy for GPU/CPU. Note that tensorflow-gpu will need to be installed for GPU to work
strategy = tf.distribute.MirroredStrategy()

try: # TPU detection
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # Picks up a connected TPU on Google's Colab, ML Engine, Kubernetes and Deep Learning VMs accessed through the 'ctpu up' utility
  #tpu = tf.distribute.cluster_resolver.TPUClusterResolver('MY_TPU_NAME') # If auto-detection does not work, you can pass the name of the TPU explicitly (tip: on a VM created with "ctpu up" the TPU has the same name as the VM)
  tf.tpu.experimental.initialize_tpu_system(tpu)
  strategy = tf.distribute.experimental.TPUStrategy(tpu)
except ValueError:
  print('Training on CPU')

with strategy.scope():
  trained_model = create_model()
  trained_model.compile(optimizer='adam', # learning rate will be set by LearningRateScheduler
                loss='categorical_crossentropy',
                metrics=['accuracy'])

  # print model layers
  trained_model.summary()

  history = trained_model.fit(x=images_data_tensor,y=label_data_tensor, epochs=EPOCHS, callbacks=[lr_decay])  


print(history.history.keys())

3 个答案:

答案 0 :(得分:0)

当我使用keras-bert进行运行分类时,我也在google colab tpu中遇到此错误。 我调低批大小和最大长度,然后错误消失了,我不知道为什么。因此,您可以尝试减小模型中的批次大小。

答案 1 :(得分:0)

您未接听电话tf.config.experimental_connect_to_cluster(tpu)

请尝试在Tensorflow 2.1+上运行它,并在开始时进行TPU初始化/检测。

# detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)

# instantiate a distribution strategy
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

答案 2 :(得分:-1)

在kaggle上遇到此错误。在我的情况下,问题是目标的数据类型是将其转换为数字数据类型的字符串,从而解决了它。希望对您有所帮助。