使用纯Python

Question

我是NumPy的新手，我正试图在我的代码中使用它来处理某些表格。

我有一个坐标列表，如下所示：

coordinates = [["2 0"], ["0 1"], ["3 4"]]

并希望像这样写：

coordinatesNumpy = np.array([[2, 0], [0, 1], [3, 4]])

在常规Python中，这很容易，但你如何使用NumPy？我是否应该使用常规Python函数为列表创建表，然后将2d表转换为np.array，或者NumPy是否具有拆分和填充的方法？

我尝试了一些东西，但他们都给了我一个错误。我尝试过的最新事情：

flowers = np.array([np.array([int(coordinate[0]), int(coordinate[2])]) for coordinate in coordinates])

我怎么能用NumPy做这样的事情？

Answer 1

看看numpy.fromstring：

coordinates_numpy = np.array([np.fromstring(i, dtype=int, sep=' ')
                             for j in coordinates for i in j])

Answer 2

使用纯Python

列出理解

这有效：

>>> flowers = np.array([[int(x)  for x in coordinate[0].split()] 
                        for coordinate in coordinates])
>>> flowers
array([[2, 0],
       [0, 1],
       [3, 4]])

我不知道任何NumPy功能可以一步完成。

效果

让我们检查事情的进展速度。

对于您的示例数据，纯Python版本是最快的：

%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in coordinates for i in j])
100000 loops, best of 3: 18.4 µs per loop

%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in coordinates])
10000 loops, best of 3: 19 µs per loop

%timeit np.array([[int(x)  for x in coordinate[0].split()] for coordinate in coordinates])
100000 loops, best of 3: 12.1 µs per loop

使数据更大：

long_coords = coordinates * 1000

但是，纯Python版本是最快的：

%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in long_coords for i in j])
100 loops, best of 3: 12.2 ms per loop

%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in long_coords])
100 loops, best of 3: 14.2 ms per loop

%timeit np.array([[int(x)  for x in coordinate[0].split()] for coordinate in long_coords])
100 loops, best of 3: 7.54 ms per loop

更大数据的一致结果：

very_long_coords = coordinates * 10000

%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in very_long_coords for i in j])
10 loops, best of 3: 125 ms per loop

%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in very_long_coords])
10 loops, best of 3: 140 ms per loop

%timeit np.array([[int(x)  for x in coordinate[0].split()] for coordinate in very_long_coords])
10 loops, best of 3: 73.5 ms per loop

Answer 3

假设C为输入列表，可以建议两种方法来解决它。

方法＃1：使用{em>一级列表理解与np.fromstring -

np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in C])

方法＃2：使用np.core.defchararray.add填充的矢量化方法，然后获取分隔的数字 -

np.fromstring(np.core.defchararray.add(C," "),dtype=int,sep=" ").reshape(len(C),-1)

样品运行 -

In [82]: C = [['2 0'], ['0 1'], ['3 4']]

In [83]: np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in C])
Out[83]: 
array([[2, 0],
       [0, 1],
       [3, 4]])

In [84]: np.fromstring(np.core.defchararray.add(C, " "),dtype=int,sep=" ").reshape(len(C),-1)
Out[84]: 
array([[2, 0],
       [0, 1],
       [3, 4]])

基准

借用@Mike Müller's solution的基准测试代码，以下是long_coords和very_long_coords案例的运行时间 -

In [78]: coordinates = [["2 0"], ["0 1"], ["3 4"]]
    ...: long_coords = coordinates * 1000
    ...: %timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in long_coords for i in j])
    ...: %timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in long_coords])
    ...: %timeit np.array([[int(x)  for x in coordinate[0].split()] for coordinate in long_coords])
    ...: %timeit np.fromstring(np.core.defchararray.add(long_coords, " "), dtype=int,sep=" ").reshape(len(long_coords),-1)
    ...: 
100 loops, best of 3: 7.27 ms per loop
100 loops, best of 3: 9.52 ms per loop
100 loops, best of 3: 6.84 ms per loop
100 loops, best of 3: 2.73 ms per loop

In [79]: coordinates = [["2 0"], ["0 1"], ["3 4"]]
    ...: very_long_coords = coordinates * 10000
    ...: %timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in very_long_coords for i in j])
    ...: %timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in very_long_coords])
    ...: %timeit np.array([[int(x)  for x in coordinate[0].split()] for coordinate in very_long_coords])
    ...: %timeit np.fromstring(np.core.defchararray.add(very_long_coords, " "), dtype=int,sep=" ").reshape(len(very_long_coords),-1)
    ...: 
10 loops, best of 3: 80.7 ms per loop
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 71 ms per loop
10 loops, best of 3: 27.2 ms per loop

嵌套NumPy数组并使用分割它们的方法

3 个答案:

使用纯Python

效果

基准