我的数据框DF
看起来像
index posts
0 <div class="content">A number of <br/><br/>three ... </div>
1 <div class="content">Stack ... <br/><br/>overflow ... </div>
...
然后我尝试使用:
对每个posts
进行标记
sentences=[]
for post in DF["posts"]:
sentences += utility.tosentences(post, tokenizer)
然后我使用以下命令运行Word2Vec:
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',\
level=logging.INFO)
num_features = 100
min_word_count = 7
num_workers = 2
context = 5
downsampling = 1e-5
print "Training model..."
model = word2vec.Word2Vec(sentences, workers=num_workers, \
size=num_features, min_count = min_word_count, \
window = context, sample = downsampling)
model.init_sims(replace=True)
Word2Vec.load()
model_name = "what"
model.save(model_name)
print "finished"
然后我测试了下面的
model.doesnt_match("travel no Warning health".split())
然而,它根本没有产生输出
我不明白上面提到的大输出的含义。为什么这不起作用?
答案 0 :(得分:0)
函数model.doesnt_match()
不打印任何内容;它返回一个值。 打印返回的值以查看输出。
如果您是从word2vec
tutorial进行复制粘贴:它会显示您在交互式控制台中运行这些命令时看到的输出。 (此外,它假设您了解自己在做什么。)