Question

我在python中使用sk-learn来拟合模型并通过模型转换input_data。

我使用 FeatureUnion 来结合 CountVectorizer 和 TfidfEmbeddingVectorizer。

只使用CountVectorizer或仅使用TfidfEmbeddingVectorizer是好的，但如果我通过Feature Union组合两个功能，则会出现如下错误：

TypeError: fit() takes 2 positional arguments but 3 were given

TfidfEmbeddingVectorizer类是这样的：

class TfidfEmbeddingVectorizer(object):
   ...
    def fit(self, X):
            tfidf = TfidfVectorizer(analyzer=lambda x: x)
            tfidf.fit(X)
            # if a word was never seen - it must be at least as infrequent
            # as any of the known words - so the default idf is the max of 
            # known idf's
            max_idf = max(tfidf.idf_)
            self.word2weight = defaultdict(
                lambda: max_idf,
                [(w, tfidf.idf_[i]) for w, i in tfidf.vocabulary_.items()])

            return self

我像这样使用了FeatureUnion：

model = gensim.models.Word2Vec(speech.train_data, size = 100)
w2v = dict(zip(model.wv.index2word, model.wv.syn0))

count = CountVectorizer(tokenizer=lambda doc: doc, lowercase=False)
w2v_tfidf = TfidfEmbeddingVectorizer(w2v)
feature_union = FeatureUnion([('ngram', count),
                             ('tfidf', w2v_tfidf)])
feature_union.fit(speech.train_data)

我已经看到将降级sk-learn版本降级到0.18.0的解决方案使它很好，但是我无法通过此错误降级sk-learn：

error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://andinghub.visualstudio.com/visual-cpp-build-tools

是否有其他使用FeatureUnion拟合函数的解决方案？

Answer 1

FeatureUnion的fit()方法根据documentation输入X和y作为输入：

适合（X，y =无）
Fit all transformers using X.

即使它的默认值是None，但它仍然传递给内部变换器。出于兼容性原因，它在管道中使用时存在。

现在讨论内部变换器fit()方法。

TfidfVectorizer's fit()有签名：

适合（raw_documents，y =无）
Learn vocabulary and idf from training set.

正如您所看到的，它也包含y，原因相同，即使不在任何地方使用它。

您的自定义TfidfEmbeddingVectorizer fit（）没有额外的y参数。

但是功能联盟会尝试将y（及其None值）推送到它，从而导致错误。只需将fit更改为：

即可

def fit(self, X, y=None):
    ....
    ....

sk-learn：fit（）的错误需要2个位置参数，但在FeatureUnion中给出3个

1 个答案: