我有一个Pandas数据帧。在2列中,我有某些点的纬度和经度。在另一个名为city的列中,我想要给出给定行的纬度和经度所属城市的名称。
我已将数据框上的纬度和经度列切成numpy数组。然后我使用多处理库来创建一个小的并行映射函数,它接受numpy数组,拆分它,将给定拆分的每个函数应用程序映射到我的计算机中的每个核心,以便最终它可以加入中间结果。
但是我无法正确地这样做。由于我对Python有点新,我想知道是否有更好的(甚至是标准的)方法来做到这一点。
我的代码如下:
def reverse_code( latitude, longitude ):
g = geocoder.google([latitude, longitude], method="reverse")
return g.city
def parallelize( data, func):
data_split = np.array_split(np.array_split(data,2), partitions)
pool = Pool(cores)
data = pd.concat(pool.map(func, data_split))
pool.close()
pool.join()
return data
cores = cpu_count()
partitions = cores
distritos = df[["latitud", "longitud"]].as_matrix
parallelize(distritos, reverse_code)
执行代码后,我收到以下错误:
---------------------------------------------------------------------------
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
C:\ProgramData\Anaconda3\envs\jeptest\lib\site-packages\numpy\lib\shape_base.py in array_split(ary, indices_or_sections, axis)
457 try:
--> 458 Ntotal = ary.shape[axis]
459 except AttributeError:
AttributeError: 'function' object has no attribute 'shape'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-26-2af0d007a2f8> in <module>()
----> 1 parallelize(distritos, reverse_code)
<ipython-input-25-d2c492561bd2> in parallelize(data, func)
1 def parallelize( data, func):
----> 2 data_split = np.array_split(np.array_split(data,2), partitions)
3 pool = Pool(cores)
4 data = pd.concat(pool.map(func, data_split))
5 pool.close()
C:\ProgramData\Anaconda3\envs\jeptest\lib\site-packages\numpy\lib\shape_base.py in array_split(ary, indices_or_sections, axis)
458 Ntotal = ary.shape[axis]
459 except AttributeError:
--> 460 Ntotal = len(ary)
461 try:
462 # handle scalar case.
TypeError: object of type 'method' has no len()