我试图用Facebook-Fasttext模块对新文本进行分类,代码如下:
#!usr/bin/python 2.7
import sys
import jieba
reload(sys)
sys.setdefaultencoding('utf-8')
import fasttext
lines=[line.strip() for line in open('./corpus_seg2.txt', 'r')]
print(len(lines))
l_c=len(lines)
train_size=int(l_c*0.8)
text_size=l_c-train_size
train_set=lines[:train_size]
text_set =lines[l_c-train_size+1:]
with open( "./train.txt", "w") as ftrain:
for line in train_set:
ftrain.write(line+'\n')
with open( "./test.txt", "w") as ftext:
for line in text_set:
ftext.write(line+'\n')
ftrain.close()
ftext.close()
classifier = fasttext.supervised("./train.txt", 'model', label_prefix='__label__')
classifier = fasttext.load_model("./model.bin", label_prefix='__label__')
test_label=classifier.predict_proba('五五开 也 很 厉害 啊')
result = classifier.test("./test.txt")
print 'P@1:', result.precision
print 'R@1:', result.recall
print 'Number of examples:', result.nexamples
print test_label
corpus_seg_2.txt是已分段的文件。模型精度为72%,召回率为72%。 然后,我用模型来预测一个新文本:'五五开也很厉害啊'。但是,我得到的test_lablel令人困惑,我想知道为什么结果如此,我该如何解决? This picture will show you the result I got after running the code I provide
答案 0 :(得分:0)
@Arthur如果您希望输出为每个句子的单个值,则需要更改此值:test_label=classifier.predict_proba('五五开 也 很 厉害 啊')
至
test_label=classifier.predict_proba(['五五开 也 很 厉害 啊'])
那是因为预测函数将字符串数组类型的文本作为输入。