预期数据重组的基本演示

Question

我正在使用TensorFlow API中的Titanic数据集。

我不知道如何使特征张量模型友好。

这是我所能得到的最好的，但是一次只用一个张量。如何使其能够处理特征项中的所有张量？

import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.optimizers import Adam
    
data = tfds.load("titanic",split='train', as_supervised=True).map(lambda x,y: (x,y)).prefetch(1)
    
for i in data.batch(1309):
    xx1 = i[0]['age']
    xx2 = i[0]['fare']
    yyy = tf.convert_to_tensor(tf.one_hot(i[1],2))

model = tf.keras.models.Sequential([tf.keras.layers.Dense(1),
tf.keras.layers.Dense(13, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')])

model.compile(
  optimizer=Adam(learning_rate=0.01), 
  loss='categorical_crossentropy', 
  metrics=['accuracy']
)

model.fit(xx1,yyy,epochs=30)

如何合并age和fare张量，使它们在一个数据集中？

我尝试concat和stack无济于事。

Answer 1

这应该可以通过使用tf.stack来完成。由于输入已经在使用数据集API，因此我重构了一些代码，以利用数据集功能将输入格式映射到您描述的目标格式。为了方便起见，下面是一个带有示例的colab笔记本：https://colab.research.google.com/drive/1dHNe9rYaJSgqbj_QtQ1aJL_7WgKnLKsU?usp=sharing

# Nothing novel here
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.optimizers import Adam

data = tfds.load("titanic",split='train', as_supervised=True).map(lambda x,y: (x,y)).prefetch(1)

预期数据重组的基本演示

从数据集中获取1个项目，然后使用tf.stack将其转换为包含两个目标数据点的张量。

for item in data.take(1):
  age = item[0]['age']
  fare = item[0]['fare']
  output = tf.stack([age, fare], axis=0)
  print(output)

输出：tf.Tensor([30. 13.], shape=(2,), dtype=float32)

在输出中，我们可以看到单个张量，其中嵌入了两个预期的值。

用作TensorFlow数据集

可以直接提供Tensorflow数据集进行训练，我们可以轻松创建一个函数，该函数将从输入数据格式映射到问题中描述的目标格式。下面的功能将使用上面的示例代码来完成此操作。

# Input data and associated label
def transform_data(item, label):

  # Extract values
  age = item['age']
  fare = item['fare']

  # Create output tensor
  output = tf.stack([age, fare], axis=0)
  return output, label

# Create a training dataset from the base dataset - for each batch map the input format to the goal format by passing the mapping function 
train_dataset = data.map(transform_data).batch(1200)

# Model - I made some minor changes to get it to run cleaner
model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(2),
  tf.keras.layers.Dense(13, activation='relu'),
  # As we have only two labels, this is really a binary problem, so I've created a single output neuron activated by sigmoid
  tf.keras.layers.Dense(1,activation='sigmoid')
])


# Compiled with binary_crossentropy to complement the binary classification
model.compile(optimizer=Adam(learning_rate=0.01),loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_dataset,epochs=30)

输出：

Epoch 1/30
2/2 [==============================] - 0s 16ms/step - loss: 11.7881 - accuracy: 0.4385
Epoch 2/30
2/2 [==============================] - 0s 7ms/step - loss: 10.2350 - accuracy: 0.4270
...

如何合并两个张量，使它们在一个数据集中？

1 个答案:

预期数据重组的基本演示

用作TensorFlow数据集