Question

我正在尝试使用scikit learn实现文本分类解决方案。

我已经能够获得简单文本分类的结果。现在我想在预测过程中添加另一个特征（非文本） - 以提高准确性。

我的数据集如下：

标签：目标价值，即＆＃39;三明治，＆＃39;问候＆＃39;或者＆＃39;再见＆＃39;
消息：文字
number_feature：随机分配的整数。为了测试FeatureUnion，我为每个类别分配了相同的编号。例如，所有＆＃39;三明治＆＃39;实例的编号为2

代码：

import pandas as pd
import sklearn 
from sklearn.pipeline import Pipeline, FeatureUnion 
from sklearn.feature_extraction.text import TfidfTransformer
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import FunctionTransformer
from sklearn.svm import LinearSVC


path = 'sunny_day.xlsx'                         
sms = pd.read_excel(path,header = None, names = ['label', 'message','number_feature'])   


#convert labels to a numeric value using a map and give it new column 'label_num'
sms['label_num'] = sms.label.map({'greeting' : 0, 'Goodbye' : 1, 'Sandwich' : 2})


X = sms.message
y = sms.label_num
z = sms.number_feature

# train test split
X_train = np.array(X[0:9])
X_test = np.array(X[9:])
y_train = np.array(y[0:9])
y_test = np.array(y[9:])
z_train = np.array(z[0:9])
z_test = np.array(z[9:])


def get_z(x):
    if np.array_equal(x, np.array(X_train)):
        return np.array(z_train).reshape(-1,1)
    else:
        return np.array(z_test).reshape(-1,1)


classifier = Pipeline([
    ('features', FeatureUnion([
        ('text',Pipeline([
            ('vectorizer', CountVectorizer()),
        ])),
        ('length', Pipeline([
            ('count', FunctionTransformer(get_z, validate = False)),
        ]))
    ])),
    ('clf',OneVsRestClassifier(LinearSVC()))])

classifier.fit(X_train, y_train)
y_pred_class = classifier.predict(X_test)
y_pred_class

正如各篇文章中所提到的，我已经使用FeatureUnion来实现这一目标。然而，我得到的准确性 - 在应用＆＃39;操纵＆＃39;之前number_feature功能甚至在它之后 - 是66.67％。

为什么在给出有偏见的功能时，准确度似乎没有提高？

数据集：

标签|消息| feature_number

greeting   How are you?             5
greeting   How is your day?         5
greeting   Good day                 5
greeting   How is it going today?   5
Goodbye    Have a nice day          4
Goodbye    See you later            4
Goodbye    Have a nice day          4
Goodbye    Talk to you soon         4
Sandwich   Make me a sandwich.      2
Sandwich    Can you make a sandwich 2
Sandwich   Having a sandwich today? 2
Sandwich    what’s for lunch        2

管道：在Python中为文本分类添加另一个功能（FeatureUnion）

0 个答案: