任何人都知道kera.preprocessing.text.Tockenizer到底如何工作?
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
data = pad_sequences(sequences, maxlen=maxlen)
labels = np.asarray(labels)
print('Shape of data tensor:', data.shape)
print('Shape of label tensor:', labels.shape)
indices = np.arange(data.shape[0])
np.random.shuffle(indices)
data = data[indices]
labels = labels[indices]
x_train = data[:training_samples]
y_train = labels[:training_samples]
x_val = data[training_samples: training_samples + validation_samples]
y_val = labels[training_samples: training_samples + validation_samples]
找到了88413个唯一令牌。数据张量的形状:(24984,100)的形状 标签张量:(24984,)
tokenizer.texts_to_sequences('You are Amrock!')
出[18]: [[5128],[1601],[1205],[],[3],[1480],[962],[],[3], [1978],[1480],[1601],[1144],[2292],[]]
Out [18]到底是什么意思?