我正在使用卷积神经网络从一组对象中预测客户的偏好。输入具有以下格式。
Customer Objects x1 x2 x3 x4 . .......x15
a1 fruits 0.5 .9 0.9 0.7
a1 veggies 0.2 .6 0.3 0.2
a1 condiments 0.0 .8 0.9 0.0
a1 dairy 0.4 .2 0.3 0.3
a1 pastries 0.6 .7 0.8 0.0
a1 other 0.9 .0 0.6 0.4
a2 fruits 0.5 .9 0.9 0.7
a2 veggies 0.2 .6 0.3 0.2
a2 condiments 0.0 .8 0.9 0.0
a2 dairy 0.4 .2 0.3 0.3
a2 pastries 0.6 .7 0.8 0.0
a2 other 0.9 .0 0.6 0.4
,依此类推。 x1 ... x15是连续变量,表示特定客户在不同perios的购买习惯。我已经规范化了这些变量并热编码了对象列的标签。我将对象列保留在预测变量中,因为与对象组合的变量x将具有含义,否则它们只是随机数。以下是我的代码,该代码创建X,Y并以可以输入到conv net的格式重塑X。
X = df
Y = df.loc[:, df.columns == 'objects']
Y = Y.values.ravel()
#Encode the class values to integers
encoder = LabelEncoder()
encoder.fit(Y)
Y_int = encoder.transform(Y)
#Hot encode the integer values
Y_cat = np_utils.to_categorical(Y_int)
# Generate Training and Validation Sets
X_train, X_test, y_train, y_test = train_test_split(X,Y_cat, test_size=.3)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
((35000, 17), (35000, 6))
((15000, 17), (15000, 6))
def data_formatting_score(df):
pred_col = [ 'x1', 'x2', 'x3',
'x4', 'x5', 'x6', 'x7',....'x15']
value_col = pred_col
df_cust=df.pivot_table(index=['customer','objects'],dropna=False,aggfunc=np.sum)[value_col]
df_c=df_c.fillna(value=0)
df_np=np.array(list(df_c.groupby(by=[df_c.index.get_level_values(0),df_c.index.get_level_values(0)]).apply(pd.DataFrame.as_matrix)))
return (df_np, df_c.index.get_level_values(0).unique())
X_train, indx = data_formatting_score(X_train)
a,b,c=X_train.shape
X_train = X_train.reshape(a,b,c,1)
print X_train.shape
(34952, 6, 15, 1)
为什么我整形时只能得到34952列?我还注意到,每次运行脚本时,样本大小都会更改。
我们能否将不同高度和宽度(在这种情况下为6 X15)的数据作为convnet的输入?