管道:在Python中为文本分类添加另一个功能(FeatureUnion)

时间:2018-03-04 17:12:05

标签: python text scikit-learn nlp classification

我正在尝试使用scikit learn实现文本分类解决方案。

我已经能够获得简单文本分类的结果。现在我想在预测过程中添加另一个特征(非文本) - 以提高准确性。

我的数据集如下:

  • 标签:目标价值,即'三明治,'问候'或者'再见'
  • 消息:文字
  • number_feature:随机分配的整数。为了测试FeatureUnion,我为每个类别分配了相同的编号。例如,所有'三明治'实例的编号为2

代码:

import pandas as pd
import sklearn 
from sklearn.pipeline import Pipeline, FeatureUnion 
from sklearn.feature_extraction.text import TfidfTransformer
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import FunctionTransformer
from sklearn.svm import LinearSVC


path = 'sunny_day.xlsx'                         
sms = pd.read_excel(path,header = None, names = ['label', 'message','number_feature'])   


#convert labels to a numeric value using a map and give it new column 'label_num'
sms['label_num'] = sms.label.map({'greeting' : 0, 'Goodbye' : 1, 'Sandwich' : 2})


X = sms.message
y = sms.label_num
z = sms.number_feature

# train test split
X_train = np.array(X[0:9])
X_test = np.array(X[9:])
y_train = np.array(y[0:9])
y_test = np.array(y[9:])
z_train = np.array(z[0:9])
z_test = np.array(z[9:])


def get_z(x):
    if np.array_equal(x, np.array(X_train)):
        return np.array(z_train).reshape(-1,1)
    else:
        return np.array(z_test).reshape(-1,1)


classifier = Pipeline([
    ('features', FeatureUnion([
        ('text',Pipeline([
            ('vectorizer', CountVectorizer()),
        ])),
        ('length', Pipeline([
            ('count', FunctionTransformer(get_z, validate = False)),
        ]))
    ])),
    ('clf',OneVsRestClassifier(LinearSVC()))])

classifier.fit(X_train, y_train)
y_pred_class = classifier.predict(X_test)
y_pred_class

正如各篇文章中所提到的,我已经使用FeatureUnion来实现这一目标。 然而,我得到的准确性 - 在应用'操纵'之前number_feature功能甚至在它之后 - 是66.67%。

为什么在给出有偏见的功能时,准确度似乎没有提高?

数据集:

标签|消息| feature_number

greeting   How are you?             5
greeting   How is your day?         5
greeting   Good day                 5
greeting   How is it going today?   5
Goodbye    Have a nice day          4
Goodbye    See you later            4
Goodbye    Have a nice day          4
Goodbye    Talk to you soon         4
Sandwich   Make me a sandwich.      2
Sandwich    Can you make a sandwich 2
Sandwich   Having a sandwich today? 2
Sandwich    what’s for lunch        2

0 个答案:

没有答案