将函数映射到Tensorflow 2.0 Alpha数据集

时间:2019-05-27 13:43:38

标签: dictionary tensorflow-datasets tensorflow2.0

我正在使用作为TensorFlow数据集一部分的cnn_dailymail数据集。我正在尝试预处理文章文本。我访问它并尝试按如下方式映射预处理功能:

dataset, info = tfds.load('cnn_dailymail', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

# Converts the unicode file to ascii
def unicode_to_ascii(s):
    return ''.join(c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn')


def preprocess_sentence(w):
    w = unicode_to_ascii(w.lower().strip())

    w = re.sub(r"([?.!,¿])", r" \1 ", w)
    w = re.sub(r'[" "]+', " ", w)

    # replacing everything with space except (a-z, A-Z, ".", "?", "!", ",")
    w = re.sub(r"[^a-zA-Z?.!,¿]+", " ", w)

    w = w.rstrip().strip()

    # adding a start and an end token to the sentence
    # so that the model know when to start and stop predicting.
    w = '<start> ' + w + ' <end>'
    return w

def map_fn(x, label):
    article_text = tf.cast(x, tf.string).decode('utf-8')
    x = preprocess_sentence(article_text)
    return x, label

# taking a small batch to check
small_batch = train_dataset.batch(5)
small_batch = small_batch.map(map_fn)

我收到以下错误: AttributeError:“ Tensor”对象没有属性“ decode”

任何有关我如何访问实际文本的帮助将不胜感激! TIA

0 个答案:

没有答案