我有一个不是单调递增的数组。我想让它在数组减少时以恒定速率单调增加。
我在这里创建了一个小示例,比率为0.2:
# Rate
rate = 0.2
# Array to interpolate
arr1 = np.array([0,1,2,3,4,4,4,3,2,2.5,3.5,5.2,7,10,9.5,np.nan,np.nan,np.nan,11.2, 11.4, 12,10,9,9.5,10.2,10.5,10.8,12,12.5,15],dtype=float)
# Line with constant rate at first monotonic decrease (index 6)
xx1 = 6
xr1 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr1 = rate*xr1 + (arr1[xx1]-rate*xx1)
# Line with constant rate at second monotonic decrease [index 14]
xx2 = 13
xr2 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr2 = rate*xr2 + (arr1[xx2]-rate*xx2)
# Line with constant rate at second monotonic decrease [index 14]
xx3 = 20
xr3 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr3 = rate*xr3 + (arr1[xx3]-rate*xx3)
plt.figure()
plt.plot(arr1,'.-',label='Original')
plt.plot(xr1,yr1,label='Const Rate line 1')
plt.plot(xr2,yr2,label='Const Rate line 2')
plt.plot(xr3,yr3,label='Const Rate line 2')
plt.legend()
plt.grid()
“原始”数组是我的数据集。 我想要的最终结果是蓝色+红色虚线。在图中,我还突出显示了“恒定速率曲线”。
由于我有非常大的数组(数百万条记录),所以我想避免整个数组的for循环。
非常感谢大家的帮助!
答案 0 :(得分:1)
这是另一种选择:如果您有兴趣根据数据绘制单调递增曲线,则可以简单地跳过两个连续递增点之间的多余点,例如在arr1[6] = 4
和arr1[11] = 5
之间,通过一条线连接它们。
import numpy as np
import matplotlib.pyplot as plt
arr1 = np.array([0,1,2,3,4,4,4,3,2,2.5,3.5,5.2,7,10,9.5,np.nan,np.nan,np.nan,11.2, 11.4, 12,10,9,9.5,10.2,10.5,10.8,12,12.5,15],dtype=float)
mask = (arr1 == np.maximum.accumulate(np.nan_to_num(arr1)))
x = np.arange(len(arr1))
plt.figure()
plt.plot(x, arr1,'.-',label='Original')
plt.plot(x[mask], arr1[mask], 'r-', label='Interp.')
plt.legend()
plt.grid()
答案 1 :(得分:0)
arr2 = arr1[1:] - arr1[:-1]
ind = numpy.where(arr2 < 0)[0]
for i in ind:
arr1[i] = arr1[i - 1] + rate
您可能需要先用numpy.amin(arr1)之类的值替换任何numpy.nan
答案 2 :(得分:0)
我想避免整个数组的for循环。
坦白说,很难在numpy中实现无for循环,因为numpy作为C-made-library使用C / C ++中实现的for循环。并且所有排序算法(例如np.argwhere,np.all等)都需要比较,因此也需要迭代。
相反,我建议至少使用一个在Python中进行的显式循环(迭代仅进行一次):
class ProgramList(generics.ListAPIView):
model = Program
permission_classes = (AllowAny,)
serializer_class = PublicProgramSerializer
queryset = Program.objects.exclude(visibility='hidden').filter(is_archived=False)
def get(self, request, *args, **kwargs):
programs = self.get_queryset()
data = self.serializer_class(programs, context={'request': request}, many=True).data
response = Response(data)
response['Cache-Control'] = 'no-cache'
return response
def get_queryset(self):
scope = self.request.GET.get('scope')
if scope and scope in CustomQuestion.SCOPE_CHOICES:
return Program.objects.filter(participant_questions__scope=scope)
else:
return Program.objects.all()
答案 3 :(得分:0)
您的问题可以用一个简单的递归差分方程表示:
y[n] = max(y[n-1] + 0.2, x[n])
因此直接的Python形式为
def func(a):
out = np.zeros_like(a)
out[0] = a[0]
for i in range(1, len(a)):
out[i] = max(out[i-1] + 0.2, a[i])
return out
不幸的是,该方程是递归的并且是非线性的,因此找到矢量化算法可能很困难。
但是,使用Numba可以使这种基于循环的算法加快300倍:
fastfunc = numba.jit(func)
arr1 = np.random.rand(1000000)
%timeit func(arr1)
# 599 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fastfunc(arr1)
# 2.22 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
答案 4 :(得分:0)
我终于设法通过while循环来完成自己想要的事情。
# data['myvar'] is the original dataset I want to reshape
data['myvar_corrected'] = data['myvar'].values
temp_d = data['myvar'].fillna(0).values*1.0
dtc = np.maximum.accumulate(temp_d)
data.loc[temp_d < np.maximum.accumulate(dtc),'myvar_corrected'] = float('nan')
stay_in_while = True
min_rate = 5/200000/(24*60)
idx_next = 0
while stay_in_while:
df_temp = data.iloc[idx_next:]
if df_tem['myvar'].isnull().sum()>0:
idx_first_nan = df_temp.reset_index().['myvar_corrected'].isnull().argmax()
idx_nan_or = (data_new.index.values==df_temp.index.values[idx_first_nan]).argmax()
x = np.arange(idx_first_nan-1,df_temp.shape[0])
y0 = df_temp.iloc[idx_first_nan-1]['myvar_corrected']
rate_curve = min_rate*x + (y0 - min_rate*(idx_first_nan-1))
damage_m_rate = df_temp.iloc[idx_first_nan-1:]['myvar_corrected']-rate_curve
try:
idx_intercept = (data_new.index.values==damage_m_rate[damage_m_rate>0].index.values[0]).argmax()
data_new.iloc[idx_nan_or:idx_intercept]['myvar'] = rate_curve[0:(damage_m_rate.index.values==damage_m_rate[damage_m_rate>0].index.values[0]).argmax()-1]
idx_next = idx_intercept + 1
except:
stay_in_while = False
else:
stay_in_while = False
# Finally I have my result stored in data_new['myvar']
在下面的图片中,结果。
感谢大家的贡献!