Question

我的数据框DF看起来像

index posts
0     <div class="content">A number of  <br/><br/>three  ... </div>
1     <div class="content">Stack ... <br/><br/>overflow  ... </div>
...

然后我尝试使用：

对每个posts进行标记

sentences=[]
for post in DF["posts"]:
     sentences += utility.tosentences(post, tokenizer)

然后我使用以下命令运行Word2Vec：

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',\
level=logging.INFO)

num_features = 100
min_word_count = 7
num_workers = 2
context = 5
downsampling = 1e-5

print "Training model..."
model = word2vec.Word2Vec(sentences,     workers=num_workers, \
        size=num_features, min_count = min_word_count, \
        window = context, sample = downsampling)

model.init_sims(replace=True)

Word2Vec.load()
model_name = "what"
model.save(model_name)
print "finished"

然后我测试了下面的

model.doesnt_match("travel no Warning health".split())

然而，它根本没有产生输出

我不明白上面提到的大输出的含义。为什么这不起作用？

Answer 1

函数model.doesnt_match()不打印任何内容;它返回一个值。打印返回的值以查看输出。

如果您是从word2vec tutorial进行复制粘贴：它会显示您在交互式控制台中运行这些命令时看到的输出。（此外，它假设您了解自己在做什么。）

为什么没有使用Word2Vec的输出？

1 个答案: