在2d numpy数组中连接非连续列的有效方法

时间:2018-02-28 18:59:17

标签: python numpy

我使用np.concatenate将非连续列与大型数据集中的某些连续列连接起来,并且我意识到如果我想用多个非连续列来执行此操作,我的方法会相当麻烦。我会链接连接所有单独的列吗?我正在寻找一个广泛的答案,而不是第2,5和7栏的解决方案。

data.apply(lambda x: expand_onerow_alt(x), axis = 1)
Out[338]: 
   2016-05-31  2016-06-01  2016-06-02  2016-06-03  2016-06-04  2016-06-05  2016-06-06  2016-06-07  2016-06-08
0         nan         nan         nan         nan         nan         nan         nan         nan         nan
1         nan         nan         nan         nan         nan         nan         nan         nan         nan

1 个答案:

答案 0 :(得分:1)

索引然后连接的替代方法是首先连接索引。

np.r_这样做很方便(虽然不是最快的):

In [40]: np.r_[22,24:27]
Out[40]: array([22, 24, 25, 26])

使用您的阵列进行测试:

In [29]: rand_data = np.random.rand(156,26)

In [31]: new_array = np.concatenate((rand_data[:,[22]],rand_data[:, 24:27]), axis = 1)
In [32]: new_array.shape
Out[32]: (156, 3)

使用r_

In [41]: arr = rand_data[:,np.r_[22,24:27]]
....
IndexError: index 26 is out of bounds for axis 1 with size 26

oops,不允许使用高级索引超出边界值(与切片索引相反)

In [42]: arr = rand_data[:,np.r_[22,24:26]]
In [43]: arr.shape
Out[43]: (156, 3)

比较时间:

In [44]: timeit new_array = np.concatenate((rand_data[:,[22]],rand_data[:, 24:27
    ...: ]), axis = 1)
15 µs ± 20.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [45]: timeit arr = rand_data[:,np.r_[22,24:26]]
29.7 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

r_方法更紧凑,但实际上有点慢。