Question

我不知道，从哪里开始这个问题，因为我现在学习神经网络。我有一个带句子的大数据库＆gt;标签对。例如：

i want take a photo < photo
i go to take a photo < photo
i go to use my camera < photo
i go to eat something < eat
i like my food < eat

如果用户写了一个新句子，我想检查所有标签准确度得分：

＆＃34;我使用相机后上床睡觉＃34; ＆LT;照片：0.9000，吃：0.4000，......

所以这个问题，我在哪里可以开始？ Tensorflow和scikit学习看起来不错，但是这个文档分类没有显示准确性：\

Answer 1

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics

sentences = ["i want take a photo", "i go to take a photo", "i go to use my camera", "i go to eat something", "i like my food"]

labels = ["photo", "photo", "photo", "eat", "eat"]

tfv = TfidfVectorizer()

# Fit TFIDF
tfv.fit(traindata)
X =  tfv.transform(traindata) 

lbl = LabelEncoder()
y = lbl.fit_transform(labels)

xtrain, xtest, ytrain, ytest = cross_validation.train_test_split(X, y, stratify=y, random_state=42)

clf = LogisitcRegression()
clf.fit(xtrain, ytrain)
predictions = clf.predict(xtest)

print "Accuracy Score = ", metrics.accuracy_score(ytest, predictions)

获取新数据：

new_sentence = ["this is a new sentence"]
X_Test = tfv.transform(new_sentence)
print clf.predict_proba(X_Test)

Python3文本标签

1 个答案: