Question

我正在研究印度自发表达数据集，该数据集包含428张图像，每个图像的形状为(1080, 1920, 3)。分类类为4，其形状为(428, 4)。在使用train_test_split分为训练，验证和测试数据的同时：

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=42)

我被提到错误。

我尝试重塑数据，但是无论如何我都无法成功。

import cv2 as cv
data=pd.read_excel('/content/drive/My Drive/ISED/details1.xlsx')
count=0
path = data['img_path']
for path in data['img_path']:
    count=count+1    
    temp1 = path.replace("'", "")
    imgpath = "/content/drive/My Drive/ISED/" + temp1
    imgFile = cv.imread(imgpath)
    X = np.asarray(imgFile)
print(X.shape)
print(count)
y = pd.get_dummies(data['emotion']).as_matrix()

# # #storing them using numpy
np.save('fdataXISED', X)
np.save('flabelsISED', y)
# #
print("Preprocessing Done")
print("Number of Features: "+str(len(X[0])))
print("Number of Labels: "+ str(len(y[0])))
print("Number of examples in dataset:"+str(len(X)))
print("X,y stored in fdataXISED.npy and flabelsISED.npy respectively")

num_features = 1920
num_labels = 4
batch_size = 64
epochs = 100
width, height = 1080, 1920

x = np.load('./fdataXISED.npy')
y = np.load('./flabelsISED.npy')
print(x.dtype)
x = x.astype(float)
x -= np.mean(x, axis=0)

x /= np.std(x, axis=0)
print(x.shape,"   ", y.shape)
#splitting into training, validation and testing data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.1, 
random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, 
test_size=0.1, random_state=

我希望对数据进行适当的拆分以进行训练。

Answer 1

问题在这里，X = np.asarray(imgFile)中的for path in data['img_path']:因此，X仅携带最后一张图像。请这样更改，

X=[]
for path in data['img_path']:
    count=count+1    
    temp1 = path.replace("'", "")
    imgpath = "/content/drive/My Drive/ISED/" + temp1
    imgFile = cv.imread(imgpath)
    imgFile = np.asarray(imgFile)
    X.append(imgFile)
X = np.asarray(X)
print(X.shape)
print(count)

最后，您的X的形状为(428,1080,1920,3)，y的形状必须为(428,4)

发生错误是因为X和y中的样本数量不同。

如何解决采样数不一致的找到的输入变量：[1080，428]错误

1 个答案: