我是NumPy的新手,我正试图在我的代码中使用它来处理某些表格。
我有一个坐标列表,如下所示:
coordinates = [["2 0"], ["0 1"], ["3 4"]]
并希望像这样写:
coordinatesNumpy = np.array([[2, 0], [0, 1], [3, 4]])
在常规Python中,这很容易,但你如何使用NumPy?我是否应该使用常规Python函数为列表创建表,然后将2d表转换为np.array
,或者NumPy是否具有拆分和填充的方法?
我尝试了一些东西,但他们都给了我一个错误。我尝试过的最新事情:
flowers = np.array([np.array([int(coordinate[0]), int(coordinate[2])]) for coordinate in coordinates])
我怎么能用NumPy做这样的事情?
答案 0 :(得分:3)
coordinates_numpy = np.array([np.fromstring(i, dtype=int, sep=' ')
for j in coordinates for i in j])
答案 1 :(得分:2)
这有效:
>>> flowers = np.array([[int(x) for x in coordinate[0].split()]
for coordinate in coordinates])
>>> flowers
array([[2, 0],
[0, 1],
[3, 4]])
我不知道任何NumPy功能可以一步完成。
让我们检查事情的进展速度。
对于您的示例数据,纯Python版本是最快的:
%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in coordinates for i in j])
100000 loops, best of 3: 18.4 µs per loop
%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in coordinates])
10000 loops, best of 3: 19 µs per loop
%timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in coordinates])
100000 loops, best of 3: 12.1 µs per loop
使数据更大:
long_coords = coordinates * 1000
但是,纯Python版本是最快的:
%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in long_coords for i in j])
100 loops, best of 3: 12.2 ms per loop
%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in long_coords])
100 loops, best of 3: 14.2 ms per loop
%timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in long_coords])
100 loops, best of 3: 7.54 ms per loop
更大数据的一致结果:
very_long_coords = coordinates * 10000
%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in very_long_coords for i in j])
10 loops, best of 3: 125 ms per loop
%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in very_long_coords])
10 loops, best of 3: 140 ms per loop
%timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in very_long_coords])
10 loops, best of 3: 73.5 ms per loop
答案 2 :(得分:2)
假设C
为输入列表,可以建议两种方法来解决它。
方法#1:使用{em>一级列表理解与np.fromstring
-
np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in C])
方法#2:使用np.core.defchararray.add
填充的矢量化方法,然后获取分隔的数字 -
np.fromstring(np.core.defchararray.add(C," "),dtype=int,sep=" ").reshape(len(C),-1)
样品运行 -
In [82]: C = [['2 0'], ['0 1'], ['3 4']]
In [83]: np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in C])
Out[83]:
array([[2, 0],
[0, 1],
[3, 4]])
In [84]: np.fromstring(np.core.defchararray.add(C, " "),dtype=int,sep=" ").reshape(len(C),-1)
Out[84]:
array([[2, 0],
[0, 1],
[3, 4]])
借用@Mike Müller's solution
的基准测试代码,以下是long_coords
和very_long_coords
案例的运行时间 -
In [78]: coordinates = [["2 0"], ["0 1"], ["3 4"]]
...: long_coords = coordinates * 1000
...: %timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in long_coords for i in j])
...: %timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in long_coords])
...: %timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in long_coords])
...: %timeit np.fromstring(np.core.defchararray.add(long_coords, " "), dtype=int,sep=" ").reshape(len(long_coords),-1)
...:
100 loops, best of 3: 7.27 ms per loop
100 loops, best of 3: 9.52 ms per loop
100 loops, best of 3: 6.84 ms per loop
100 loops, best of 3: 2.73 ms per loop
In [79]: coordinates = [["2 0"], ["0 1"], ["3 4"]]
...: very_long_coords = coordinates * 10000
...: %timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in very_long_coords for i in j])
...: %timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in very_long_coords])
...: %timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in very_long_coords])
...: %timeit np.fromstring(np.core.defchararray.add(very_long_coords, " "), dtype=int,sep=" ").reshape(len(very_long_coords),-1)
...:
10 loops, best of 3: 80.7 ms per loop
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 71 ms per loop
10 loops, best of 3: 27.2 ms per loop