向卷积神经网络输入形状

时间:2018-11-08 15:02:23

标签: python tensorflow neural-network conv-neural-network data-science

我正在使用卷积神经网络从一组对象中预测客户的偏好。输入具有以下格式。

Customer    Objects  x1   x2   x3    x4 . .......x15
a1           fruits  0.5  .9   0.9   0.7
a1           veggies 0.2  .6   0.3   0.2
a1        condiments 0.0  .8   0.9   0.0
a1           dairy   0.4  .2   0.3   0.3
a1        pastries   0.6  .7   0.8   0.0
a1        other      0.9  .0   0.6   0.4
a2           fruits  0.5  .9   0.9   0.7
a2           veggies 0.2  .6   0.3   0.2
a2        condiments 0.0  .8   0.9   0.0
a2           dairy   0.4  .2   0.3   0.3
a2        pastries   0.6  .7   0.8   0.0
a2        other      0.9  .0   0.6   0.4

,依此类推。 x1 ... x15是连续变量,表示特定客户在不同perios的购买习惯。我已经规范化了这些变量并热编码了对象列的标签。我将对象列保留在预测变量中,因为与对象组合的变量x将具有含义,否则它们只是随机数。以下是我的代码,该代码创建X,Y并以可以输入到conv net的格式重塑X。

X = df
Y = df.loc[:, df.columns == 'objects']
Y = Y.values.ravel()
#Encode the class values to integers
encoder = LabelEncoder()
encoder.fit(Y)
Y_int = encoder.transform(Y)
#Hot encode the integer values
Y_cat = np_utils.to_categorical(Y_int)

# Generate Training and Validation Sets
X_train, X_test, y_train, y_test = train_test_split(X,Y_cat, test_size=.3)

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

((35000, 17), (35000, 6))
((15000, 17), (15000, 6))

def data_formatting_score(df):
  pred_col = [ 'x1', 'x2', 'x3',
               'x4', 'x5', 'x6', 'x7',....'x15']
  value_col = pred_col
  df_cust=df.pivot_table(index=['customer','objects'],dropna=False,aggfunc=np.sum)[value_col]
  df_c=df_c.fillna(value=0)
  df_np=np.array(list(df_c.groupby(by=[df_c.index.get_level_values(0),df_c.index.get_level_values(0)]).apply(pd.DataFrame.as_matrix)))
  return (df_np, df_c.index.get_level_values(0).unique())

X_train, indx = data_formatting_score(X_train)
a,b,c=X_train.shape

X_train = X_train.reshape(a,b,c,1)

print X_train.shape
(34952, 6, 15, 1)
  1. 为什么我整形时只能得到34952列?我还注意到,每次运行脚本时,样本大小都会更改。

  2. 我们能否将不同高度和宽度(在这种情况下为6 X15)的数据作为convnet的输入?

0 个答案:

没有答案