在dask.array.map_blocks中调用并返回cv2.cvtColor [OpenCV,Dask]

时间:2017-04-13 10:00:29

标签: python opencv dask

我尝试使用dask以并行方式执行从3通道到1通道的颜色转换。我有希望这样试试,以便将来可以执行内存不足的计算。我使用da.map_blocks。

from dask.array.image import imread
import dask.array as da
import numpy as np

import cv2

import matplotlib.pyplot as plt
%matplotlib inline

im = imread('../datatest/*.JPG')  # wrap around existing images

def showplt(x):
#     print(np.array(im[0]))
    gray = cv2.cvtColor(np.array(x[0]), cv2.COLOR_BGR2GRAY)
    print("shape of `x` in showplt:", np.array(x[0]).shape)
    print("shape of `gray` in showplt:", gray.shape)
    return gray

c = im.chunks
print("chunk size of `im`", im.chunks, '\n')
result = im.map_blocks(showplt, dtype=im[0].dtype, chunks=(c[0], c[1], c[2], c[3]))
s = result.compute()

但是我收到了这个错误

chunk size of `im` ((1, 1, 1, 1), (5184,), (3456,), (3,)) 

shape of `x` in showplt: (5184, 3456, 3)
shape of `gray` in showplt: (5184, 3456)
shape of `x` in showplt: (5184, 3456, 3)
shape of `gray` in showplt: (5184, 3456)
shape of `x` in showplt: (5184, 3456, 3)
shape of `gray` in showplt: (5184, 3456)
shape of `x` in showplt: (5184, 3456, 3)
shape of `gray` in showplt: (5184, 3456)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-177-f86d33eced47> in <module>()
     20 print("chunk size of `im`", im.chunks, '\n')
     21 result = im.map_blocks(showplt, dtype=im[0].dtype, chunks=(c[0], c[1], c[2], c[3]))
---> 22 s = result.compute()

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/base.py in compute(self, **kwargs)
     93             Extra keywords to forward to the scheduler ``get`` function.
     94         """
---> 95         (result,) = compute(self, traverse=False, **kwargs)
     96         return result
     97 

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/base.py in compute(*args, **kwargs)
    205     return tuple(a if not isinstance(a, Base)
    206                  else a._finalize(next(results_iter))
--> 207                  for a in args)
    208 
    209 

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/base.py in <genexpr>(.0)
    205     return tuple(a if not isinstance(a, Base)
    206                  else a._finalize(next(results_iter))
--> 207                  for a in args)
    208 
    209 

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/array/core.py in finalize(results)
    914     while isinstance(results2, (tuple, list)):
    915         if len(results2) > 1:
--> 916             return concatenate3(results)
    917         else:
    918             results2 = results2[0]

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/array/core.py in concatenate3(arrays)
   3335     if not arrays:
   3336         return np.empty(0)
-> 3337     chunks = chunks_from_arrays(arrays)
   3338     shape = tuple(map(sum, chunks))
   3339 

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/array/core.py in chunks_from_arrays(arrays)
   3240 
   3241     while isinstance(arrays, (list, tuple)):
-> 3242         result.append(tuple([shape(deepfirst(a))[dim] for a in arrays]))
   3243         arrays = arrays[0]
   3244         dim += 1

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/array/core.py in <listcomp>(.0)
   3240 
   3241     while isinstance(arrays, (list, tuple)):
-> 3242         result.append(tuple([shape(deepfirst(a))[dim] for a in arrays]))
   3243         arrays = arrays[0]
   3244         dim += 1

IndexError: tuple index out of range

我还要将chunks parameter中的map_blocks修改为

result = im.map_blocks(showplt, dtype=im[0].dtype, chunks=(c[0], c[1], c[2]))

但它没有成功

chunk size of `im` ((1, 1, 1, 1), (5184,), (3456,), (3,)) 

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-178-7b668f779a32> in <module>()
     19 c = im.chunks
     20 print("chunk size of `im`", im.chunks, '\n')
---> 21 result = im.map_blocks(showplt, dtype=im[0].dtype, chunks=(c[0], c[1], c[2]))
     22 s = result.compute()

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/array/core.py in map_blocks(self, func, *args, **kwargs)
   1568     @wraps(map_blocks)
   1569     def map_blocks(self, func, *args, **kwargs):
-> 1570         return map_blocks(func, self, *args, **kwargs)
   1571 
   1572     def map_overlap(self, func, depth, boundary=None, trim=True, **kwargs):

/home/sendowo/Projects/non-text_segmentation/env/lib/python3.5/site-packages/dask/array/core.py in map_blocks(func, *args, **kwargs)
    679         if len(chunks) != len(numblocks):
    680             raise ValueError("Provided chunks have {0} dims, expected {1} "
--> 681                              "dims.".format(len(chunks), len(numblocks)))
    682         chunks2 = []
    683         for i, (c, nb) in enumerate(zip(chunks, numblocks)):

ValueError: Provided chunks have 3 dims, expected 4 dims.

如何指定chunksize ??

2 个答案:

答案 0 :(得分:3)

当您的函数更改底层NumPy数组的形状时,map_blocks方法会变得棘手。我认为您在正确的轨道上指定了块,但您还需要指定要删除的维度。

mywaitbar ( '**disable**' )
for ii=1:10
  h = mywaitbar ( ii );
  fprintf ( 'test with waitbar disabled %i\n', ii);
end

答案 1 :(得分:1)

最后我知道了诀窍。 drop_axis=0给我一个错误

ValueError: Can't drop an axis with more than 1 block. Please use `atop` instead.

为了使其有效,我将drop_axis=[1,3]chunks=(im.shape[1], im.shape[2])

一起使用
from dask.array.image import imread
import dask.array as da
import numpy as np

import cv2

import matplotlib.pyplot as plt
%matplotlib inline

im = imread('../datatest/*.JPG')  # wrap around existing images

def showplt(x):
    gray = cv2.cvtColor(x[0], cv2.COLOR_BGR2GRAY)
    return gray

c = im.chunks
result = im.map_blocks(showplt, dtype=im.dtype, chunks=(im.shape[1], im.shape[2]), drop_axis=[1,3])
print(result)
plt.imshow(result, cmap='gray')

然而,它给我一个垂直连接的图像

dask.array<showplt, shape=(20736, 3456), dtype=uint8, chunksize=(5184, 3456)>
Out[10]:
<matplotlib.image.AxesImage at 0x7f10c461e7b8>

result of <code>map_blocks</code>

要使其像imread一样可迭代,我需要重塑result

reshape = result.reshape((im.shape[0], im.shape[1], im.shape[2]))
plt.imshow(reshape[0], cmap='gray')

结果

dask.array<reshape, shape=(4, 5184, 3456), dtype=uint8, chunksize=(1, 5184, 3456)>
<matplotlib.image.AxesImage at 0x7f10c479e668>

result of <code>reshape</code>