拉伸数组并填充nan

时间:2019-01-22 08:24:50

标签: python arrays numpy nan numpy-ndarray

我有一个长度为n的1-d numpy数组,我想将其扩展为m(n

例如:

>>> arr = [4,5,1,2,6,8] # take this
>>> stretch(arr,8)
[4,5,np.nan,1,2,np.nan,6,8] # convert to this

要求: 1.两端都没有nan(如果可能) 2.竭尽全力

我尝试过

>>> def stretch(x,to,fill=np.nan):
...     step = to/len(x)
...     output = np.repeat(fill,to)
...     foreign = np.arange(0,to,step).round().astype(int)
...     output[foreign] = x
...     return output

>>> arr = np.random.rand(6553)
>>> stretch(arr,6622)

  File "<ipython-input-216-0202bc39278e>", line 2, in <module>
    stretch(arr,6622)

  File "<ipython-input-211-177ee8bc10a7>", line 9, in stretch
    output[foreign] = x

ValueError: shape mismatch: value array of shape (6553,) could not be broadcast to indexing result of shape (6554,)

似乎不能正常工作(对于长度为6553的数组,违反要求2,并且不保证1),有什么线索可以克服这一点?

4 个答案:

答案 0 :(得分:2)

使用roundrobin from itertools Recipes

- in: path
  name: code2
  **reguired: true** <-- required: true
  type: string
  description: "code 2"

背后的逻辑

from itertools import cycle, islice def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" # Recipe credited to George Sakkis pending = len(iterables) nexts = cycle(iter(it).__next__ for it in iterables) while pending: try: for next in nexts: yield next() except StopIteration: pending -= 1 nexts = cycle(islice(nexts, pending)) def stretch(x, to, fill=np.nan): n_gaps = to - len(x) return np.hstack([*roundrobin(np.array_split(x, n_gaps+1), np.repeat(fill, n_gaps))]) arr = [4,5,1,2,6,8] stretch(arr, 8) # array([ 4., 5., nan, 1., 2., nan, 6., 8.]) arr2 = np.random.rand(655) stretched_arr2 = stretch(arr,662) np.diff(np.argwhere(np.isnan(stretched_arr2)), axis=0) # nans are evenly spaced array([[83], [83], [83], [83], [83], [83]]) :计算要填充的空隙(期望的长度-当前长度)

n_gaps:与np_array_split一起使用时,它将输入数组拆分为尽可能长的长度

n_gaps+1:由于roundrobin产生的数组多于空位,因此循环(即交替迭代)可以使np_array_split永远不在结果的两端。

答案 1 :(得分:1)

此方法将非nan元素放置在边界处,将nan值保留在中心,尽管它不会均匀分布nan值。

arr = [4,5,1,2,6,8]   
stretch_len = 8    

def stretch(arr, stretch_len):
    stretched_arr = np.empty(stretch_len)   
    stretched_arr.fill(np.nan)
    arr_len = len(arr)

    if arr_len % 2 == 0:
        mid = int(arr_len/2)
        stretched_arr[:mid] = arr[:mid]
        stretched_arr[-mid:] = arr[-mid:]
    else:
        mid = int(np.floor(arr_len/2))
        stretched_arr[:mid] = arr[:mid]
        stretched_arr[-mid-1:] = arr[-mid-1:]

    return stretched_arr

这是我测试过的一些测试用例:

测试案例:

In [104]: stretch(arr, stretch_len)   
Out[104]: array([ 4.,  5.,  1., nan, nan,  2.,  6.,  8.])

In [105]: arr = [4, 5, 1, 2, 6, 8, 9]    

In [106]: stretch(arr, stretch_len)  
Out[106]: array([ 4.,  5.,  1., nan,  2.,  6.,  8.,  9.])

In [107]: stretch(arr, 9)  
Out[107]: array([ 4.,  5.,  1., nan, nan,  2.,  6.,  8.,  9.])

答案 2 :(得分:1)

尽管Chris解决了该问题,但我发现了一个简短的答案,这也许会有所帮助,

def stretch2(x,to,fill=np.nan):
    output  = np.repeat(fill,to)
    foreign = np.linspace(0,to-1,len(x)).round().astype(int)
    output[foreign] = x
    return output

非常类似于我的第一次尝试。时间:

>>> x = np.random.rand(1000)
>>> to = 1200
>>> %timeit stretch(x,to) # Chris' version
>>> %timeit stretch2(x,to)

996 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
32.2 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

检查其是否正常运行:

>>> aa = stretch2(x,to)
>>> np.diff(np.where(np.isnan(aa))[0])
array([6, 6, 6, ... , 6])
>>> np.sum(aa[~np.isnan(aa)] - x)
0.0

检查边界条件:

>>> aa[:5]
array([0.78581616, 0.1630689 , 0.52039993,        nan, 0.89844404])
>>> aa[-5:]
array([0.7063653 ,        nan, 0.2022172 , 0.94604503, 0.91201897])

都满意。适用于所有一维数组,并且只需进行一些更改即可普遍适用于n维数组。

答案 3 :(得分:0)

您可以使用resize调整数组的大小。

调整大小后,您可以应用适当的逻辑来重新排列内容。

检查以下链接: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html