我有一个sklearn模型,它使用带有Tf-Idf方案的字符n-gram来应用分类任务,如下代码所示:
model = Pipeline([
('vect', CountVectorizer(analyzer='char', ngram_range=(3,5))),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier(alpha = 0.0001,
loss = 'log',
epsilon = 7,
max_iter=8,
random_state = 40,
tol = None))
])
gs_clf = model.fit(x, y)
predicted = gs_clf.predict(xDev)
print( 'Accuracy: ', accuracy_score(yDev, predicted))
问题是:
输入数据是推文,我手动生成另一个新功能列表" ex。推文中单词的数量是正数",并且新列表的维度与sklearn模型创建的训练矩阵相同,我想将特征列表水平附加到矩阵但是我没有想到' t ..
我在网站上发现了一些问题,但没有提出明确答案。
我试图这样做,但它不起作用:
model = Pipeline([
('feats', FeatureUnion([
pos,
neg,
])),
('vect', CountVectorizer(analyzer='char', ngram_range=(3,5))),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier(alpha = 0.0001,
loss = 'log',
epsilon = 7,
max_iter=8,
random_state = 40,
tol = None))
])