所以,我想要做的是使用获得的kepler数据{em 3}来对系外行星和非系外行星进行分类。数据类型是维度为( num_of_samples,3197 )的时间序列。我发现这可以通过在Keras中使用一维卷积层来完成。但我一直搞乱尺寸并得到以下错误
检查模型输入时出错:预期conv1d_1_input具有形状(无,3197,1)但是具有形状的数组(1,570,3197)
所以,问题是:
1.数据(training_set和test_set)是否需要转换为3D张量?如果是,那么正确的维度是什么?
2.什么是正确的输入形状?我知道我有1个功能的3197个时间步,但here没有指定他们是否使用TF或theano后端,所以我仍然感到头疼。
顺便说一句,我使用的是TF后端。非常感谢任何帮助!谢谢!
"""
Created on Wed May 17 18:23:31 2017
@author: Amajid Sinar
"""
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use("ggplot")
import numpy as np
#Importing training set
training_set = pd.read_csv("exoTrain.csv")
X_train = training_set.iloc[:,1:].values
y_train = training_set.iloc[:,0:1].values
#Importing test set
test_set = pd.read_csv("exoTest.csv")
X_test = test_set.iloc[:,1:].values
y_test = test_set.iloc[:,0:1].values
#Scale the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)
#Convert data into 3d tensor
X_train = np.reshape(X_train,(1,X_train.shape[0],X_train.shape[1]))
X_test = np.reshape(X_test,(1,X_test.shape[0],X_test.shape[1]))
#Importing convolutional layers
from keras.models import Sequential
from keras.layers import Convolution1D
from keras.layers import MaxPooling1D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers.normalization import BatchNormalization
#Convolution steps
#1.Convolution
#2.Max Pooling
#3.Flattening
#4.Full Connection
#Initialising the CNN
classifier = Sequential()
#Input shape must be explicitly defined, DO NOT USE (None,shape)!!!
#1.Multiple convolution and max pooling
classifier.add(Convolution1D(filters=8, kernel_size=11, activation="relu", input_shape=(3197,1)))
classifier.add(MaxPooling1D(strides=4))
classifier.add(BatchNormalization())
classifier.add(Convolution1D(filters=16, kernel_size=11, activation='relu'))
classifier.add(MaxPooling1D(strides=4))
classifier.add(BatchNormalization())
classifier.add(Convolution1D(filters=32, kernel_size=11, activation='relu'))
classifier.add(MaxPooling1D(strides=4))
classifier.add(BatchNormalization())
#classifier.add(Convolution1D(filters=64, kernel_size=11, activation='relu'))
#classifier.add(MaxPooling1D(strides=4))
#2.Flattening
classifier.add(Flatten())
#3.Full Connection
classifier.add(Dropout(0.5))
classifier.add(Dense(64, activation='relu'))
classifier.add(Dropout(0.25))
classifier.add(Dense(64, activation='relu'))
classifier.add(Dense(1, activation='sigmoid'))
#Configure the learning process
classifier.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
#Train!
classifier.fit_generator(X_train, steps_per_epoch=X_train.shape[0], epochs=1, validation_data=(X_test,y_test))
score = classifier.evaluate(X_test, y_test)
答案 0 :(得分:5)
是的,您的数据集应该是一个3d张量。
正确的输入形状(对于张量流后端)是(sample_number,sample_size,channel_number)。您可以从错误消息中检查“预期的维度是(无,3197,1)”。
'无'是指任意大小的维度,因为它可以用于培训中使用的样本数量。
所以在你的情况下,正确的形状是(570,3197,1)。
如果您碰巧使用theano后端,则应首先将频道尺寸放入: (sample_number,channel_number,sample_size)或在您的特定情况下
(570,1,3197)
答案 1 :(得分:1)
假设数据的形状是
>>> data.shape()
(m, n)
因此,您应该添加一个新轴作为channel axis
,
>>> data = data[..., np.newaxis]
>>> data.shape()
(m, n, 1)