Question

使用 Tensorflow 2.0 alpha 时，我尝试使用以下数据创建ValueError: Can't convert Python sequence with mixed types to Tensor时收到错误tf.data.Dataset：

Inspect the complete dataset on Kaggle

很显然，存在多种数据类型。 Sex是字符串，Age是浮点/双精度，SibSp和Parch是整数，依此类推。

我的（Python 3）代码将这个 Pandas数据框转换为tf.data.Dataset，是基于Tensorflow在How to classify structured data上的教程所组成的，并且看起来像这样：

def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()

  # the 'Survived' column is the label (not shown in the image of the Dataframe but exists in the Dataframe)
  label = dataframe.pop('Survived')

  # create the dataset from the dataframe
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), label))

  # if shuffle == true, randomize the entries
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)

  return ds

如上所述，此函数在执行时会抛出错误 ValueError: Can't convert Python sequence with mixed types to Tensor ，例如：

train_ds = df_to_dataset(df_train, batch_size=32)

（而df_train是您可以在图像中看到的熊猫数据框）

现在，我想知道是否丢失了某些内容，因为Tensorflow的教程（如上所述）也使用混合类型的数据框，但是在尝试使用完全相同的df_to_dataset函数的示例时，我没有遇到任何错误。

Answer 1

此错误是由于NaN值是特定列引起的。用dataframe['Name'].isnull().sum())检测它们并替换。

在EagerTensor中使用不同的数据类型

1 个答案: