Question

我试图使示例尽可能少。以下脚本创建了一个类似于我的输入的SparseTensor：

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

np_orders = np.unique(np.random.randint(1, high=30, size=[15,2], dtype='int'), axis=0) #Generates 15 pairs of (customer id, item id)

max_customer = max(np_orders[:,0]) #Largest customer id
max_item = max(np_orders[:,1]) #Largest item id


#Customer item matrix, Sparse Tensor
cim = tf.sparse.SparseTensor(indices = np_orders, values = np.ones(np_orders.shape[0], dtype=int), dense_shape = [max_customer, max_item])

请注意：我的数据结构必须为稀疏张量。我尝试了使用稀疏矩阵的其他选项，RAM开销对于我的原始数据集来说太高了。我尝试过映射，然后在稀疏矩阵中使用向下映射的数据，但仍然过大几个数量级（需要数百TB的RAM。）

这是一个简单的损失函数，将稀疏张量作为输入：

class sparse_loss(tf.losses.Loss):
    def call (self, y_true, y_pred):     
        y_true = tf.sparse.to_dense(y_true)          
        y_true = tf.cast(y_true, dtype = tf.float32)
        total_loss = tf.math.squared_difference(y_true, y_pred) 
        return tf.reduce_mean(total_loss, axis=-1)

这是一个模型：

model = keras.models.Sequential()
model.add(layers.Input(shape=(cim.shape[1],), sparse = True))
model.add(layers.Dense(cim.shape[1], use_bias = False))
model.compile(optimizer='adam', loss = sparse_loss())
model.fit(cim, cim)

以这种形式，添加密集层会引发错误：ValueError: The last dimension of the inputs to 'Dense' should be defined. Found 'None'. 如果在添加Input层时删除了sparse = True，它将引发以下错误：TypeError: Input must be a SparseTensor.跟踪此错误，似乎Keras在SageMaker中所做的第一件事就是推过形状为[None，None]的张量和dtype float32。由于该张量不是稀疏，因此在损失函数的第一行中断。试图专门捕获此Tensor只会使系统在其他地方中断，并且非常hacky。我希望使用其他解决方案。

所有这些代码在我的本地计算机上均无错误。实际上，我已经使用SageMaker中NoteBooks可用的每个TensorFlow内核尝试了此代码。

为什么此代码在SageMaker中中断，但在我的计算机上没有中断？我的第一个想法是，我使用的是默认的Cython，而SageMaker使用的是Conda。但是我不知道为什么会导致这种错误。

更重要的是，该代码应如何更改才能在SageMaker中使用？

如何在SageMaker上将稀疏张量传递到Keras模型中

0 个答案: