我正在尝试使用python多处理程序包来加速一系列大型数据框的旋转,尺寸约为。 10k * 12k-by-3-> 10k--12k
数据集本质上是一个长格式矩阵。
这是我尝试运行的代码:
parts = [df1, df2, df3]
pool = mp.Pool(3)
matrices = pool.map(long_to_matrix, parts)
这会触发较大矩阵的以下错误。
error Traceback (most recent call last)
<ipython-input-111-7ebabe0045b8> in <module>
----> 1 matrices = pool.map(long_to_matrix, parts)
/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
266 in a list that is returned.
267 '''
--> 268 return self._map_async(func, iterable, mapstar, chunksize).get()
269
270 def starmap(self, func, iterable, chunksize=None):
/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):
/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
429 break
430 try:
--> 431 put(task)
432 except Exception as e:
433 job, idx = task[:2]
/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/connection.py in send(self, obj)
204 self._check_closed()
205 self._check_writable()
--> 206 self._send_bytes(_ForkingPickler.dumps(obj))
207
208 def recv_bytes(self, maxlength=None):
/XXXX/tools/miniconda2/envs/aj_work/lib/python3.7/multiprocessing/connection.py in _send_bytes(self, buf)
391 n = len(buf)
392 # For wire compatibility with 3.2 and lower
--> 393 header = struct.pack("!i", n)
394 if n > 16384:
395 # The payload is large so Nagle's algorithm won't be triggered
error: 'i' format requires -2147483648 <= number <= 2147483647
有人知道我怎么可能解决这个问题吗?我实际上尝试将矩阵拆分为较小的块,但是切片数据帧的过程实际上要比数据透视表花费更长的时间,因此这不是一个好选择。这是错误吗?