Python-将数据编码为Word

时间:2018-11-12 15:12:41

标签: python machine-learning nlp gensim word2vec

我有一个将单词转换为矢量的代码。下面是我的代码:

# word_to_vec_demo.py

from gensim.models import word2vec
import logging

logging.basicConfig(format='%(asctime)s : \
%(levelname)s : %(message)s', level=logging.INFO)

sentences = [['In', 'the', 'beginning', 'Abba','Yahweh', 'created', 'the',
'heaven', 'and', 'the', 'earth.', 'And', 'the', 'earth', 'was',
'without', 'form,', 'and', 'void;', 'and', 'darkness', 'was',
'upon', 'the', 'face', 'of', 'the', 'deep.', 'And', 'the',
'Spirit', 'of', 'Yahweh', 'moved', 'upon', 'the', 'face',  'of',
'the', 'waters.']]

model = word2vec.Word2Vec(sentences, size=10, min_count=1)

print("Vector for \'earth\' is: \n")
print(model.wv['earth'])

print("\nEnd demo")

输出为

Vector for 'earth' is: 

[-0.00402722  0.0034133   0.01583795  0.01997946  0.04112177  0.00291858
-0.03854967  0.01581967 -0.02399057  0.00539708]

是否可以将向量数组编码为单词?如果是,我将如何在Python中实现它?

1 个答案:

答案 0 :(得分:2)

您可以使用模型中的similar_by_vector()方法来按向量查找前N个最相似的词。 希望这会有所帮助。