Question

我正在使用Keras构建二进制分类模型。 Dataset包含许多分类功能（IP地址，目标号码，目标地址，用户代理等）

我无法提交预测，因为功能属于分类，培训和测试数据的列数与预测不同。

  File "/Users/spicyramen/Documents/Development/Python/gl-env/lib/python2.7/site-packages/keras/models.py", line 1006, in predict
    return self.model.predict(x, batch_size=batch_size, verbose=verbose)
  File "/Users/spicyramen/Documents/Development/Python/gl-env/lib/python2.7/site-packages/keras/engine/training.py", line 1772, in predict
    check_batch_axis=False)
  File "/Users/spicyramen/Documents/Development/Python/gl-env/lib/python2.7/site-packages/keras/engine/training.py", line 153, in _standardize_input_data
    str(array.shape))
ValueError: Error when checking : expected dense_1_input to have shape (None, 2134) but got array with shape (34, 102)

我能够分割数据和训练模型。

ruri                object
ruri_user           object
ruri_domain         object
from_user           object
from_domain         object
from_tag            object
to_user             object
contact_user        object
callid              object
content_type        object
user_agent          object
source_ip           object
source_port          int64
destination_port     int64
contact_ip          object
contact_port         int64
toll_fraud           int64

这是我的逻辑：

从CSV导入数据
删除不需要的列
生成虚拟列（encode_one_hot）
将数据集拆分为训练和测试数据。
火车模型
评估
提交预测＆lt; - 失败

这是我的code

培训和测试尺寸

Samples Columns
1665 2134
555  2134

功能：

def preproc_test(self):
        """Pre-process testing data."""

        #Import data
        test = self.import_data(self.test_fn, drop=True)
        # Extract labels.
        labels = test.user_agent.values
        # Fix NA values.
        test = self.fix_na(test)

        # Feature Engineering
        #test = self.engineer_features(test)

        # Create dummy variables.
        test = encode_one_hot(test, 'ruri_user')
        test = encode_one_hot(test, 'from_user')
        test = encode_one_hot(test, 'from_domain')
        test = encode_one_hot(test, 'to_user')
        test = encode_one_hot(test, 'contact_user')
        test = encode_one_hot(test, 'user_agent')
        test = encode_one_hot(test, 'source_ip')
        test = encode_one_hot(test, 'contact_ip')
        return labels, test


def prepare_submission(self, name):
        labels, test_data = self.preproc_test()
        predictions = self.model.predict(test_data)
        subm = pd.DataFrame(np.column_stack([labels, np.around(predictions[:, 1])]).astype('int32'),
                            columns=['user_agent', 'toll_fraud'])
        subm.to_csv('%s.csv' % name, index=False)
        return subm

原始issue

不确定我是否应该将我的预测调整为相同数量的原始功能/列，如果是，那么最佳方法是什么？

预测具有分类功能

0 个答案: