Question

我想拼凑一个新列表，这是一个使用两列numpy数组的字符串。但是，我似乎无法在不循环遍历每个元素的情况下使用它：

import numpy as np
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1))
print(test_list[:,0])
print(test_list[:,1])

def dumbstring(points):
    # Loop through and append a list
    string_pnts = []
    for x in points:
        string_pnts.append("X co-ordinate is %g and y is %g" % (x[0], x[1]))
    return string_pnts

def dumbstring2(points):
    # Prefill a list
    string_pnts = [""] * len(points)
    i = 0
    for x in points:
        string_pnts[i] = ("X co-ordinate is %g and y is %g" % (x[0], x[1]))
        i += 1
    return string_pnts

def numpystring(points):
    return ("X co-ordinate is %g and y is %g" % (points[:,0], points[:,1]))    

def numpystring2(point_x, point_y):
    return ("X co-ordinate is %g and y is %g" % (point_x, point_y))

前两个工作（我原本以为预填充会比追加更快但看起来相同）：

%timeit tdumbstring = dumbstring(test_list) # 239ms
%timeit tdumbstring2 = dumbstring2(test_list) # 239ms

然而，最后一次没有 - 我想知道有没有办法对这个函数进行矢量化呢？

tnumpystring = numpystring(test_list) # Error
tnumpystring2 = numpystring2(test_list[:,0],test_list[:,1]) # Error

修改

我尝试过Pandas，因为我实际上并不需要Numpy，但它有点慢：

import pandas as pd
df = pd.DataFrame(test_list)
df.columns = ['x','y']
% time pdtest = ("X co-ordinate is " + df.x.map(str) + " and y is " + df.y.map(str)).tolist()
print(test[:5])

我也尝试过映射，但这也比循环遍历np：

慢

def mappy(pt_x,pt_y):
    return("X co-ordinate is %g and y is %g" % (pt_x, pt_y))
%time mtest1 = list(map(lambda x: mappy(x[0],x[1]),test_list))
print(mtest1[:5])

时序：

Answer 1

以下是使用numpy.core.defchararray.add的解决方案，首先将您的类型设置为str。

from numpy.core.defchararray import add    
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1)).astype(str)

def stringy_arr(points):
    return add(add('X coordinate is ', points[:,0]),add(' and y coordinate is ', points[:,1]))

稍微更快的时间：

%timeit stringy_arr(test_list)
1 loops, best of 3: 216 ms per loop

array(['X coordinate is 1 and y coordinate is 2',
       'X coordinate is 3 and y coordinate is 4',
       'X coordinate is 5 and y coordinate is 6', ...,
       'X coordinate is 1 and y coordinate is 2',
       'X coordinate is 3 and y coordinate is 4',
       'X coordinate is 5 and y coordinate is 6'], 
      dtype='|S85')

# Previously tried functions
%time dumbstring(test_list)
1 loops, best of 3: 340 ms per loop

%timeit tdumbstring2 = dumbstring2(test_list)
1 loops, best of 3: 320 ms per loop

%time mtest1 = list(map(lambda x: mappy(x[0],x[1]),test_list))
1 loops, best of 3: 340 ms per loop

修改

你也可以使用纯粹的python进行理解，比我第一次提出的解决方案要快得多：

test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(10000000,1)).astype(str) #10M test_list = test_list.tolist() def comp(points): return ['X coordinate is %s Y coordinate is %s' % (x,y) for x,y in points] %timeit comp(test_list) 1 loops, best of 3: 6.53 s per loop ['X coordinate is 1 Y coordinate is 2', 'X coordinate is 3 Y coordinate is 4', 'X coordinate is 5 Y coordinate is 6', 'X coordinate is 1 Y coordinate is 2', 'X coordinate is 3 Y coordinate is 4', 'X coordinate is 5 Y coordinate is 6',... %timeit dumbstring(test_list) 1 loops, best of 3: 30.7 s per loop

从numpy数组创建字符串列表（非循环解决方案）

1 个答案: