我正在使用Keras构建二进制分类模型。 Dataset包含许多分类功能(IP地址,目标号码,目标地址,用户代理等)
我无法提交预测,因为功能属于分类,培训和测试数据的列数与预测不同。
File "/Users/spicyramen/Documents/Development/Python/gl-env/lib/python2.7/site-packages/keras/models.py", line 1006, in predict
return self.model.predict(x, batch_size=batch_size, verbose=verbose)
File "/Users/spicyramen/Documents/Development/Python/gl-env/lib/python2.7/site-packages/keras/engine/training.py", line 1772, in predict
check_batch_axis=False)
File "/Users/spicyramen/Documents/Development/Python/gl-env/lib/python2.7/site-packages/keras/engine/training.py", line 153, in _standardize_input_data
str(array.shape))
ValueError: Error when checking : expected dense_1_input to have shape (None, 2134) but got array with shape (34, 102)
我能够分割数据和训练模型。
ruri object
ruri_user object
ruri_domain object
from_user object
from_domain object
from_tag object
to_user object
contact_user object
callid object
content_type object
user_agent object
source_ip object
source_port int64
destination_port int64
contact_ip object
contact_port int64
toll_fraud int64
这是我的逻辑:
encode_one_hot
)这是我的code
培训和测试尺寸
Samples Columns
1665 2134
555 2134
功能:
def preproc_test(self):
"""Pre-process testing data."""
#Import data
test = self.import_data(self.test_fn, drop=True)
# Extract labels.
labels = test.user_agent.values
# Fix NA values.
test = self.fix_na(test)
# Feature Engineering
#test = self.engineer_features(test)
# Create dummy variables.
test = encode_one_hot(test, 'ruri_user')
test = encode_one_hot(test, 'from_user')
test = encode_one_hot(test, 'from_domain')
test = encode_one_hot(test, 'to_user')
test = encode_one_hot(test, 'contact_user')
test = encode_one_hot(test, 'user_agent')
test = encode_one_hot(test, 'source_ip')
test = encode_one_hot(test, 'contact_ip')
return labels, test
def prepare_submission(self, name):
labels, test_data = self.preproc_test()
predictions = self.model.predict(test_data)
subm = pd.DataFrame(np.column_stack([labels, np.around(predictions[:, 1])]).astype('int32'),
columns=['user_agent', 'toll_fraud'])
subm.to_csv('%s.csv' % name, index=False)
return subm
原始issue
不确定我是否应该将我的预测调整为相同数量的原始功能/列,如果是,那么最佳方法是什么?