应用错误收集

我可以使用txt格式的单词向量模型，如下所示：

if not os.path.exists(.vector_cache):
    os.mkdir(.vector_cache)
vectors = Vectors(name='myvector/glove/glove.6B.200d.txt')
TEXT.build_vocab(train, vectors=vectors)

但是，当我转向googlenews-vectors-negative300.bin等二进制格式时，出现了一个错误：无法将字符串转换为浮点型。代码与上面的几乎相同：

if not os.path.exists(.vector_cache):
    os.mkdir(.vector_cache)
vectors = Vectors(name='GoogleNews-vectors-negative300.bin')
TEXT.build_vocab(train, vectors=vectors)

那么，如何使用二进制格式的单词矢量模型来构建词汇呢？另外，我们应该直接使用预训练模型的词汇表，还是从训练集中构建词汇表，或者从训练集+测试集中构建词汇表？我对此很困惑。

如何使用torchtext用二进制文件（例如“ GoogleNews-vectors-negative300.bin”）构建词汇表？

0 个答案: