我有一个numpy数组(不一定排序):
[2.0, 3.0, nan, nan, nan, 5.0]
我想计算这个数组的差异。最后一个元素5和第二个元素3之间的差异是2.我希望这个2的差异分布在我的numpy数组的封闭nan元素上。如果我尝试numpy.diff(我也尝试使用蒙面数组),我得到结果:
[nan, 1, nan, nan, nan, nan]
。
结果应如下所示:
[nan, 1, 0.5, 0.5, 0.5, 0.5]
更新:
我得到了上述具体案例的答案,但给定的答案在更一般的形式下无效。例如,如果我们有追踪/领先纳米,当我们有交替的纳米和价值时。例如:
[nan, nan, 2.0, 3.0, nan, nan, nan, 5.0, nan, 6.0, nan]
答案 0 :(得分:1)
假设您要做的是将输出[i]映射到输入[i]和输入[i-1]的差异,并且在nans的特殊情况下,您要分发nans之间的区别,如果是这个想法,我认为这就是你想要的:
import numpy as np
def arrdiffs(a):
out = np.array(np.zeros(len(a)))
diff=np.nan
difflen=0
for i,e in enumerate(a):
if i==0:
# in the first cell we always output nan
out[i]=np.nan
elif np.isnan(a[i]):
# when the input is nan, just increase difflen
difflen+=1
elif np.isnan(a[i-1]):
# when the previous input is nan, but this one isn't
# distribute the diff across the previous cells and this one
difflen+=1
m=float(abs(a[i]-diff))
for j in range(i-difflen+1,i+1):
out[j]=m/difflen
difflen=0
diff=a[i]
else:
# othewise simply do the diff locally between this cell and
# previous
out[i]=abs(a[i]-a[i-1])
diff=a[i] # write down diff in case the next input cells are nan
difflen=0
return out
a=np.array([2.0,3.0,np.nan,np.nan,np.nan,5.0])
print arrdiffs(a)
编辑:切换到4个空格标签而不是2,将if / else变为elifs, 在每个分支上添加了评论。
当我运行它时,我得到你的预期输出:
$ python arrdiffs.py
[ nan 1. 0.5 0.5 0.5 0.5]
编辑:将diff的初始值切换为np.nan以考虑我们从一系列nans开始的情况,可能我们只输出nan,直到我们得到至少一些初始值。期待OP澄清目标是什么。在[i-1]为nan但a [i]不是(这是一个bug)的情况下,也将赋值diff切换为[i]。关于OP提供的新测试用例:
[np.nan, np.nan, 2.0, 3.0, np.nan, np.nan, np.nan, 5.0, np.nan, 6.0, np.nan]
此更新代码提供:
>>> [ nan nan nan 1. 0.5 0.5 0.5 0.5 0.5 0.5 0. ]
这是OP想要的吗?寻求澄清。
答案 1 :(得分:1)
那应该做的工作:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: a = [2.0, 3.0, np.nan, np.nan, np.nan, 5.0]
In [4]: s = pd.Series(a)
In [5]: result = s.reset_index()\
...: .dropna()\
...: .diff()\
...: .pipe(lambda x: x[0]/x['index'])\
...: .reindex(s.index)\
...: .fillna(method='bfill')
In [6]: result[0] = np.nan
In [7]: result
Out[7]:
0 NaN
1 1.0
2 0.5
3 0.5
4 0.5
5 0.5
dtype: float64
答案 2 :(得分:1)
我只是先插入nan的。通过这种方式,您可以在这两个步骤之间保持良好的分离,从而更容易地改变插值方式。
import numpy as np
a = np.array([2.0, 3.0, np.nan, np.nan, np.nan, 5.0])
x = np.arange(a.size)
a_filled = np.interp(x, x[np.isfinite(a)], a[np.isfinite(a)])
np.diff(a_filled)
# results in
array([ 1. , 0.5, 0.5, 0.5, 0.5])
对于更花哨的插值,Pandas可能是一个不错的选择,它也有一个.diff()
方法用于Dataframes。
答案 3 :(得分:1)
感谢Rutger Kassies,我一直在研究大熊猫,他们有开箱即用的方法来解决这个一般问题:
将数组转换为dataframe,插入数据帧并获取diff:
import pandas as pd
array = [nan, nan, 2.0, 3.0, nan, nan, nan, 5.0, nan, 6.0, nan]
df = pd.DataFrame(array)
interpolation = df.interpolate()
diff = interpolation.diff()
结果是:
[NaN, NaN, NaN, 1.0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0]