我不知道,从哪里开始这个问题,因为我现在学习神经网络。我有一个带句子的大数据库>标签对。例如:
i want take a photo < photo
i go to take a photo < photo
i go to use my camera < photo
i go to eat something < eat
i like my food < eat
如果用户写了一个新句子,我想检查所有标签准确度得分:
&#34;我使用相机后上床睡觉#34; &LT;照片:0.9000,吃:0.4000,......
所以这个问题,我在哪里可以开始? Tensorflow和scikit学习看起来不错,但是这个文档分类没有显示准确性:\
答案 0 :(得分:1)
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics
sentences = ["i want take a photo", "i go to take a photo", "i go to use my camera", "i go to eat something", "i like my food"]
labels = ["photo", "photo", "photo", "eat", "eat"]
tfv = TfidfVectorizer()
# Fit TFIDF
tfv.fit(traindata)
X = tfv.transform(traindata)
lbl = LabelEncoder()
y = lbl.fit_transform(labels)
xtrain, xtest, ytrain, ytest = cross_validation.train_test_split(X, y, stratify=y, random_state=42)
clf = LogisitcRegression()
clf.fit(xtrain, ytrain)
predictions = clf.predict(xtest)
print "Accuracy Score = ", metrics.accuracy_score(ytest, predictions)
获取新数据:
new_sentence = ["this is a new sentence"]
X_Test = tfv.transform(new_sentence)
print clf.predict_proba(X_Test)