Question

我一直致力于构建机器学习算法来识别图像，首先创建我自己的h5数据库。我一直在关注this教程，它很有用，但我一直遇到一个重大错误 - 当在代码的图像处理部分使用OpenCV时，程序无法保存处理过的图像，因为它会不断翻转图像的高度和宽度。当我尝试编译时，我收到以下错误：

Traceback (most recent call last):
   File "array+and+label+data.py", line 79, in <module>
   hdf5_file["train_img"][i, ...] = img[None]
   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
   File "/Users/USER/miniconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 631, in __setitem__
   for fspace in selection.broadcast(mshape):
   File "/Users/USER/miniconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 299, in broadcast
   raise TypeError("Can't broadcast %s -> %s" % (target_shape, count))
   TypeError: Can't broadcast (1, 240, 320, 3) -> (1, 320, 240, 3)

我的图像应该全部大小为320 x 240，但你可以看到这是以某种方式翻转。周围的研究向我展示了这是因为OpenCV和NumPy对高度和宽度使用不同的约定，但我不知道如何在不修补我的OpenCV安装的情况下在这段代码中协调这个问题。关于如何解决这个问题的任何想法？我是Python及其所有库的相对新手（虽然我很了解Java）！

提前谢谢！

编辑：为上下文添加更多代码，这非常类似于＆＃34;加载图像并保存它们的教程中的内容＆＃34;代码示例。

我的阵列的大小：

train_shape = (len(train_addrs), 320, 240, 3)
val_shape = (len(val_addrs), 320, 240, 3)
test_shape = (len(test_addrs), 320, 240, 3)

循环覆盖图像地址并调整其大小的代码：

# Loop over training image addresses
  for i in range(len(train_addrs)):
     # print how many images are saved every 1000 images
     if i % 1000 == 0 and i > 1:
     print ('Train data: {}/{}'.format(i, len(train_addrs)))

     # read an image and resize to (320, 240)
     # cv2 load images as BGR, convert it to RGB
     addr = train_addrs[i]
     img = cv2.imread(addr)
     img = cv2.resize(img, (320, 240), interpolation=cv2.INTER_CUBIC)
     img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

     # save the image and calculate the mean so far
     hdf5_file["train_img"][i, ...] = img[None]
     mean += img / float(len(train_labels))

Answer 1

周围的研究告诉我，这是因为OpenCV和NumPy对高度和宽度使用不同的约定

不完全是。对图像唯一棘手的是2D数组/矩阵用（row，col）索引，这与我们可能用于图像的正常笛卡尔坐标（x，y）相反。因此，有时当你在OpenCV函数中指定点时，它需要它们在（x，y）坐标---并且类似地，它想要在（w，h）中指定图像的尺寸而不是（h， w）就像阵列一样。在OpenCV的resize()函数中就是这种情况。你把它传递给（h，w），但它实际上想要（w，h）。来自docs for resize()：

dsize - 输出图片大小;如果它等于零，则计算如下：
dsize = Size(round(fx*src.cols), round(fy*src.rows))
dsize或fx和fy都必须为非零。

因此，您可以在此处看到列数是第一个维度（宽度），行数是第二个维度（高度）。

简单的解决方法就是在resize()函数中将你的（h，w）交换为（w，h）：

img = cv2.resize(img, (240, 320), interpolation=cv2.INTER_CUBIC)

h5py翻转图像尺寸

1 个答案: