Question

我正在尝试构建一个模型，该模型具有数据框的数字特征和数据框的文本特征的组合。但是，我在成功组合功能，使用功能进行培训，然后测试功能方面遇到很多麻烦。

现在，我正在尝试像这样使用DataFrameMapper：

from sklearn.feature_extraction.text import TfidfTransformer
from sklearn_pandas import DataFrameMapper


mapper = DataFrameMapper([
     ('body', TfidfVectorizer()),
     ('numeric_feature', None),

 ]) 

for train_index, test_index in kFold.split(DF['body']):

    # Split the dataset by Kfold

    X_train = even_rand[['body','numeric_feature']].iloc[train_index]
    y_train = even_rand['sub_class'].iloc[train_index]


    X_test = even_rand[['body','numeric_feature']].iloc[test_index]
    y_test = even_rand['sub_class'].iloc[test_index]

    # Vectorize/transform docs

    X_train = mapper.fit_transform(X_train)
    X_test = mapper.fit_transform(X_test)



    # Get SVM
    svm = SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-3, n_iter=5, random_state=10)
    svm.fit(X_train, y_train)
    svm_score = svm.score(X_test, y_test)

这成功地组合了数据并训练了数据，但是当我尝试测试数据时，功能似乎无法正确匹配，并且出现了错误

ValueError：每个样本X具有49974个功能；期望87786

会有人知道如何解决此问题，或者知道将数字和文本特征组合/训练/测试在一起的更好方法吗？如果可能的话，我也想将特征保留为稀疏矩阵。

Answer 1

代替：

OK, finally i figured out a way to do this using logback access.

    Include the following dependency

<dependency>
                <groupId>net.rakugakibox.spring.boot</groupId>
                <artifactId>logback-access-spring-boot-starter</artifactId>
                <version>2.7.0</version>
    </dependency>




Also create a logback-access.xml in resources folder with following configuration.  

<configuration>
    <property resource="application.properties" />
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>logs/dev_access.log</file>
        <rollingPolicy
            class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>logs/Archive/dev_access_%d{yyyy-MM-dd}.log</fileNamePattern>
        </rollingPolicy>
        <encoder>
            <pattern>%h %l %u %t "%r" %s %b  %D</pattern>
        </encoder>
    </appender>
    <appender-ref ref="FILE" />
</configuration>

尝试：

X_train = mapper.fit_transform(X_train)
X_test = mapper.fit_transform(X_test)

如何将TF_IDF矢量化器与自定义功能结合使用

1 个答案: