我正在使用作为TensorFlow数据集一部分的cnn_dailymail数据集。我正在尝试预处理文章文本。我访问它并尝试按如下方式映射预处理功能:
dataset, info = tfds.load('cnn_dailymail', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
# Converts the unicode file to ascii
def unicode_to_ascii(s):
return ''.join(c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn')
def preprocess_sentence(w):
w = unicode_to_ascii(w.lower().strip())
w = re.sub(r"([?.!,¿])", r" \1 ", w)
w = re.sub(r'[" "]+', " ", w)
# replacing everything with space except (a-z, A-Z, ".", "?", "!", ",")
w = re.sub(r"[^a-zA-Z?.!,¿]+", " ", w)
w = w.rstrip().strip()
# adding a start and an end token to the sentence
# so that the model know when to start and stop predicting.
w = '<start> ' + w + ' <end>'
return w
def map_fn(x, label):
article_text = tf.cast(x, tf.string).decode('utf-8')
x = preprocess_sentence(article_text)
return x, label
# taking a small batch to check
small_batch = train_dataset.batch(5)
small_batch = small_batch.map(map_fn)
我收到以下错误: AttributeError:“ Tensor”对象没有属性“ decode”
任何有关我如何访问实际文本的帮助将不胜感激! TIA