有没有一种 NumPy 方法可以一次对数组中的几个元素进行采样?

时间:2021-06-09 18:37:12

标签: python numpy

我有一组数据要采样,以便更快地进行一阶分析。如果我想一次检查一个点,我可以用切片来完成:

import numpy as np
array = np.arange(0,20000)
samplespace = 5000
sampledarray = array[::samplespace]

但是,我需要分析多个元素范围内的区域。我能想到的唯一方法是使用 for 循环:

import numpy as np
array = np.arange(0,20000)
samplespace = 5000
n = 3
sampledarray = array[0::samplespace]
for i in range(1,n):
    arraysample_i = array[i::samplespace]
    indices = np.linspace(i,len(sampledarray),len(arraysample_i)).astype(int)
    sampledarray = np.insert(sampledarray,indices,arraysample_i)
print(sampledarray)
>>> [ 0 1 2 5000 5001 5002 10000 10001 10002 15000 15001 15002 ]

如果我使用大型数组并在多个维度进行采样,我担心循环会花费大量时间来运行。有没有更简单、更快的方法来做到这一点?

3 个答案:

答案 0 :(得分:1)

import numpy as np
import itertools

array = np.arange(0,20000)
samplespace = 5000

indices = itertools.chain.from_iterable((ind for ind in range(i, len(array), samplespace) )for i in range(n))


output = array[list(indices)]

输出:

array([    0,  5000, 10000, 15000,     1,  5001, 10001, 15001,     2,
        5002, 10002, 15002])

答案 1 :(得分:1)

感谢 aminrd 建议使用 itertools;这是一个我以前没有使用过的包,但最终给了我我需要的东西。

这是我最终做的:

import numpy as np
import itertools
array = np.arange(0,20000)
samplespace = 5000
n = 3

iterate = itertools.count(start=0,step=samplespace)
num = int(len(array)/samplespace)
idx = np.array([next(iterate) for _ in range(num)])
idxlist = np.zeros(0)
for i in range(n):
    idxi = np.copy(idx)+i
    idxlist = np.append(idxlist,idxi)
idxlist = np.sort(idxlist).astype(int)

sampledarray = array[idxlist]
print(sampledarray)
>>> [ 0 1 2 5000 5001 5002 10000 10001 10002 15000 15001 15002 ]

这让我可以相当轻松地将其扩展到更多维度,而不是使用我的大型数据集,我只需要操作索引数组:

import numpy as np
import itertools
array = np.empty((200,200),dtype=object)
# I know this is a lousy way to define the array but it works well for illustrative purposes
for i in range(200):
    for j in range(200):
        array[i,j] = (i,j)
samplespace = 50
n = 3

iteratex = itertools.count(start=0,step=samplespace)
iteratey = itertools.count(start=0,step=samplespace)
num = int(len(array)/samplespace)
idxx = np.array([next(iteratex) for _ in range(num)])
idxy = np.array([next(iteratey) for _ in range(num)])
idxlistx = np.zeros(0)
idxlisty = np.zeros(0)
for i in range(n):
    idxxi = np.copy(idxx)+i
    idxyi = np.copy(idxy)+i
    idxlistx = np.append(idxlistx,idxxi)
    idxlisty = np.append(idxlisty,idxyi)
idxlistx = np.sort(idxlistx).astype(int)
idxlisty = np.sort(idxlisty).astype(int)

# Having to index the array twice seems awkward, even though I understand it is necessary 
# for array broadcasting if the two index arrays are of different lengths
sampledarray = array[idxlistx,:]
sampledarray = sampledarray[:,idxlisty]
print(sampledarray)

>>>[[(0, 0) (0, 1) (0, 2) (0, 50) (0, 51) (0, 52) (0, 100) (0, 101) (0, 102)
  (0, 150) (0, 151) (0, 152)]
 [(1, 0) (1, 1) (1, 2) (1, 50) (1, 51) (1, 52) (1, 100) (1, 101) (1, 102)
  (1, 150) (1, 151) (1, 152)]
 [(2, 0) (2, 1) (2, 2) (2, 50) (2, 51) (2, 52) (2, 100) (2, 101) (2, 102)
  (2, 150) (2, 151) (2, 152)]
 [(50, 0) (50, 1) (50, 2) (50, 50) (50, 51) (50, 52) (50, 100) (50, 101)
  (50, 102) (50, 150) (50, 151) (50, 152)]
 [(51, 0) (51, 1) (51, 2) (51, 50) (51, 51) (51, 52) (51, 100) (51, 101)
  (51, 102) (51, 150) (51, 151) (51, 152)]
 [(52, 0) (52, 1) (52, 2) (52, 50) (52, 51) (52, 52) (52, 100) (52, 101)
  (52, 102) (52, 150) (52, 151) (52, 152)]
 [(100, 0) (100, 1) (100, 2) (100, 50) (100, 51) (100, 52) (100, 100)
  (100, 101) (100, 102) (100, 150) (100, 151) (100, 152)]
 [(101, 0) (101, 1) (101, 2) (101, 50) (101, 51) (101, 52) (101, 100)
  (101, 101) (101, 102) (101, 150) (101, 151) (101, 152)]
 [(102, 0) (102, 1) (102, 2) (102, 50) (102, 51) (102, 52) (102, 100)
  (102, 101) (102, 102) (102, 150) (102, 151) (102, 152)]
 [(150, 0) (150, 1) (150, 2) (150, 50) (150, 51) (150, 52) (150, 100)
  (150, 101) (150, 102) (150, 150) (150, 151) (150, 152)]
 [(151, 0) (151, 1) (151, 2) (151, 50) (151, 51) (151, 52) (151, 100)
  (151, 101) (151, 102) (151, 150) (151, 151) (151, 152)]
 [(152, 0) (152, 1) (152, 2) (152, 50) (152, 51) (152, 52) (152, 100)
  (152, 101) (152, 102) (152, 150) (152, 151) (152, 152)]]

答案 2 :(得分:0)

thread1 = threading.Thread(target=Run_App)
thread2 = threading.Thread(target=Run_Server)
thread1.start()
thread2.start()  

输出:

import numpy as np
array = np.arange(0,20000)
samplespace = 5000
n = 3
sampledarray = np.stack([array[i::samplespace] for i in range(n)]).flatten(order='F')