我正在研究印度自发表达数据集,该数据集包含428张图像,每个图像的形状为(1080, 1920, 3)
。分类类为4,其形状为(428, 4)
。在使用train_test_split分为训练,验证和测试数据的同时:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=42)
我被提到错误。
我尝试重塑数据,但是无论如何我都无法成功。
import cv2 as cv
data=pd.read_excel('/content/drive/My Drive/ISED/details1.xlsx')
count=0
path = data['img_path']
for path in data['img_path']:
count=count+1
temp1 = path.replace("'", "")
imgpath = "/content/drive/My Drive/ISED/" + temp1
imgFile = cv.imread(imgpath)
X = np.asarray(imgFile)
print(X.shape)
print(count)
y = pd.get_dummies(data['emotion']).as_matrix()
# # #storing them using numpy
np.save('fdataXISED', X)
np.save('flabelsISED', y)
# #
print("Preprocessing Done")
print("Number of Features: "+str(len(X[0])))
print("Number of Labels: "+ str(len(y[0])))
print("Number of examples in dataset:"+str(len(X)))
print("X,y stored in fdataXISED.npy and flabelsISED.npy respectively")
num_features = 1920
num_labels = 4
batch_size = 64
epochs = 100
width, height = 1080, 1920
x = np.load('./fdataXISED.npy')
y = np.load('./flabelsISED.npy')
print(x.dtype)
x = x.astype(float)
x -= np.mean(x, axis=0)
x /= np.std(x, axis=0)
print(x.shape," ", y.shape)
#splitting into training, validation and testing data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.1,
random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train,
test_size=0.1, random_state=
我希望对数据进行适当的拆分以进行训练。
答案 0 :(得分:0)
问题在这里,X = np.asarray(imgFile)
中的for path in data['img_path']:
因此,X
仅携带最后一张图像。请这样更改,
X=[]
for path in data['img_path']:
count=count+1
temp1 = path.replace("'", "")
imgpath = "/content/drive/My Drive/ISED/" + temp1
imgFile = cv.imread(imgpath)
imgFile = np.asarray(imgFile)
X.append(imgFile)
X = np.asarray(X)
print(X.shape)
print(count)
最后,您的X
的形状为(428,1080,1920,3)
,y
的形状必须为(428,4)
发生错误是因为X
和y
中的样本数量不同。