Question

我有一个不是单调递增的数组。我想让它在数组减少时以恒定速率单调增加。

我在这里创建了一个小示例，比率为0.2：

# Rate
rate = 0.2

# Array to interpolate
arr1 = np.array([0,1,2,3,4,4,4,3,2,2.5,3.5,5.2,7,10,9.5,np.nan,np.nan,np.nan,11.2, 11.4, 12,10,9,9.5,10.2,10.5,10.8,12,12.5,15],dtype=float)

# Line with constant rate at first monotonic decrease (index 6)
xx1 = 6
xr1 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr1 = rate*xr1 + (arr1[xx1]-rate*xx1)

# Line with constant rate at second monotonic decrease [index 14]
xx2 = 13
xr2 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr2 = rate*xr2 + (arr1[xx2]-rate*xx2)

# Line with constant rate at second monotonic decrease [index 14]
xx3 = 20
xr3 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr3 = rate*xr3 + (arr1[xx3]-rate*xx3)

plt.figure()
plt.plot(arr1,'.-',label='Original')
plt.plot(xr1,yr1,label='Const Rate line 1')
plt.plot(xr2,yr2,label='Const Rate line 2')
plt.plot(xr3,yr3,label='Const Rate line 2')
plt.legend()
plt.grid()

“原始”数组是我的数据集。我想要的最终结果是蓝色+红色虚线。在图中，我还突出显示了“恒定速率曲线”。

由于我有非常大的数组（数百万条记录），所以我想避免整个数组的for循环。

非常感谢大家的帮助！

Answer 1

这是另一种选择：如果您有兴趣根据数据绘制单调递增曲线，则可以简单地跳过两个连续递增点之间的多余点，例如在arr1[6] = 4和arr1[11] = 5之间，通过一条线连接它们。

import numpy as np
import matplotlib.pyplot as plt

arr1 = np.array([0,1,2,3,4,4,4,3,2,2.5,3.5,5.2,7,10,9.5,np.nan,np.nan,np.nan,11.2, 11.4, 12,10,9,9.5,10.2,10.5,10.8,12,12.5,15],dtype=float)

mask = (arr1 == np.maximum.accumulate(np.nan_to_num(arr1)))

x = np.arange(len(arr1))

plt.figure()
plt.plot(x, arr1,'.-',label='Original')
plt.plot(x[mask], arr1[mask], 'r-', label='Interp.')    
plt.legend()
plt.grid()

Answer 2

arr2 = arr1[1:] - arr1[:-1]
ind = numpy.where(arr2 < 0)[0]
for i in ind:
    arr1[i] = arr1[i - 1] + rate

您可能需要先用numpy.amin（arr1）之类的值替换任何numpy.nan

Answer 3

我想避免整个数组的for循环。

坦白说，很难在numpy中实现无for循环，因为numpy作为C-made-library使用C / C ++中实现的for循环。并且所有排序算法（例如np.argwhere，np.all等）都需要比较，因此也需要迭代。

相反，我建议至少使用一个在Python中进行的显式循环（迭代仅进行一次）：

class ProgramList(generics.ListAPIView):
    model = Program
    permission_classes = (AllowAny,)
    serializer_class = PublicProgramSerializer
    queryset = Program.objects.exclude(visibility='hidden').filter(is_archived=False)

    def get(self, request, *args, **kwargs):
        programs = self.get_queryset()
        data = self.serializer_class(programs, context={'request': request}, many=True).data
        response = Response(data)
        response['Cache-Control'] = 'no-cache'
        return response

    def get_queryset(self):
        scope = self.request.GET.get('scope')
        if scope and scope in CustomQuestion.SCOPE_CHOICES:
            return Program.objects.filter(participant_questions__scope=scope)
        else:
            return Program.objects.all()

Answer 4

您的问题可以用一个简单的递归差分方程表示：

y[n] = max(y[n-1] + 0.2, x[n])

因此直接的Python形式为

def func(a):
    out = np.zeros_like(a)
    out[0] = a[0]
    for i in range(1, len(a)):
        out[i] = max(out[i-1] + 0.2, a[i])

    return out

不幸的是，该方程是递归的并且是非线性的，因此找到矢量化算法可能很困难。

但是，使用Numba可以使这种基于循环的算法加快300倍：

fastfunc = numba.jit(func)

arr1 = np.random.rand(1000000)

%timeit func(arr1)
# 599 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fastfunc(arr1)
# 2.22 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 5

我终于设法通过while循环来完成自己想要的事情。

# data['myvar'] is the original dataset I want to reshape
data['myvar_corrected'] = data['myvar'].values
temp_d = data['myvar'].fillna(0).values*1.0
dtc = np.maximum.accumulate(temp_d)
data.loc[temp_d < np.maximum.accumulate(dtc),'myvar_corrected'] = float('nan')
stay_in_while = True
min_rate = 5/200000/(24*60)
idx_next = 0
while stay_in_while:
    df_temp = data.iloc[idx_next:]
    if df_tem['myvar'].isnull().sum()>0:
        idx_first_nan = df_temp.reset_index().['myvar_corrected'].isnull().argmax()

        idx_nan_or = (data_new.index.values==df_temp.index.values[idx_first_nan]).argmax()

        x = np.arange(idx_first_nan-1,df_temp.shape[0])
        y0 = df_temp.iloc[idx_first_nan-1]['myvar_corrected']
        rate_curve = min_rate*x + (y0 - min_rate*(idx_first_nan-1))

        damage_m_rate = df_temp.iloc[idx_first_nan-1:]['myvar_corrected']-rate_curve

        try:
            idx_intercept = (data_new.index.values==damage_m_rate[damage_m_rate>0].index.values[0]).argmax()
            data_new.iloc[idx_nan_or:idx_intercept]['myvar'] = rate_curve[0:(damage_m_rate.index.values==damage_m_rate[damage_m_rate>0].index.values[0]).argmax()-1]
            idx_next = idx_intercept + 1
        except:
            stay_in_while = False
    else:
        stay_in_while = False
# Finally I have my result stored in data_new['myvar']

在下面的图片中，结果。

感谢大家的贡献！

以最小的速率重塑numpy数组

5 个答案: