我正在从头开始实现word2vec架构。但是我的模型无法收敛。
node *q = c->next;
c->data = q->data; second: c->data = c->next->data;
c->next = q->next; c->next = c->next->next;
delete q; delete c->next;
我使用text8语料库,并应用了诸如词干,词根化和二次采样之类的预处理技术。另外,我还排除了英语停用词和有限的词汇量
class SkipGramBatcher:
def __init__(self, text):
self.text = text.results
def get_batches(self, batch_size):
n_batches = len(self.text)//batch_size
pairs = []
for idx in range(0, len(self.text)):
window_size = 5
idx_neighbors = self._get_neighbors(self.text, idx, window_size)
#one_hot_idx = self._to_one_hot(idx)
#idx_pairs = [(one_hot_idx, self._to_one_hot(idx_neighbor)) for idx_neighbor in idx_neighbors]
idx_pairs = [(idx,idx_neighbor) for idx_neighbor in idx_neighbors]
pairs.extend(idx_pairs)
for idx in range(0, len(pairs), batch_size):
X = [pair[0] for pair in pairs[idx:idx+batch_size]]
Y = [pair[1] for pair in pairs[idx:idx+batch_size]]
yield X,Y
def _get_neighbors(self, text, idx, window_size):
text_length = len(text)
start = max(idx-window_size,0)
end = min(idx+window_size+1,text_length)
neighbors_words = set(text[start:end])
return list(neighbors_words)
def _to_one_hot(self, indexes):
n_values = np.max(indexes) + 1
return np.eye(n_values)[indexes]
我使用tensorflow进行图形计算
vocab_size = 20000
text_len = len(text)
test_text_len = int(text_len*0.15)
preprocessed_text = PreprocessedText(text,vocab_size)
并应用负采样
train_graph = tf.Graph()
with train_graph.as_default():
inputs = tf.placeholder(tf.int32, [None], name='inputs')
labels = tf.placeholder(tf.int32, [None, None], name='labels')
n_embedding = 300
with train_graph.as_default():
embedding = tf.Variable(tf.random_uniform((vocab_size, n_embedding), -1, 1))
embed = tf.nn.embedding_lookup(embedding, inputs)
最后我训练我的模型
# Number of negative labels to sample
n_sampled = 100
with train_graph.as_default():
softmax_w = tf.Variable(tf.truncated_normal((vocab_size, n_embedding))) # create softmax weight matrix here
softmax_b = tf.Variable(tf.zeros(vocab_size), name="softmax_bias") # create softmax biases here
# Calculate the loss using negative sampling
loss = tf.nn.sampled_softmax_loss(
weights=softmax_w,
biases=softmax_b,
labels=labels,
inputs=embed,
num_sampled=n_sampled,
num_classes=vocab_size)
cost = tf.reduce_mean(loss)
optimizer = tf.train.AdamOptimizer().minimize(cost)
但是运行此模型后,我的平均批次损失并没有显着降低
我想我应该在某个地方犯了一个错误。感谢您的帮助
答案 0 :(得分:0)
是什么使您说“我的平均批次损失没有显着减少”?您所附的图表显示一些(未标记的)值显着下降,并且在接近数据末尾时仍以较大的斜率下降。
“收敛”将显示为损耗改善首先变慢,然后停止。
但是,如果您的损失仍然明显下降,那就继续训练吧!在较小的数据集上(例如您正在使用的小型text8
),使用更多的时间特别重要。