首先创建数据:
import pandas as pd
import numpy as np
%matplotlib inline
data = pd.DataFrame({'time':np.arange(10)})
data['sin_of_the_times']= np.sin(data.time)
newdata = pd.DataFrame({'time': np.linspace(0,10,15)})
newdata['sin_of_the_times'] = np.NAN
data['interpolated']=False
newdata['interpolated']= True
ultimatedata = pd.concat([data, newdata])
ultimatedata.sort_values('time', inplace=True)
哪个给你的?
time sin_of_the_times interpolated
0 0.000000 0.000000 False
0 0.000000 NaN True
1 0.714286 NaN True
1 1.000000 0.841471 False
2 1.428571 NaN True
2 2.000000 0.909297 False
...
无需创建新函数,Python中已经存在一种插值方法,该插值方法将采用:
这种插值方法是否有名称? (在这种情况下是上采样),插值方法似乎仅基于一列。
答案 0 :(得分:3)
您仍然想进行线性插值;您只需要指定点之间的距离取决于time
,而不是假设它们之间的间距均匀。因此,首先将索引设置为time
,然后使用interpolate
df = df.set_index('time')
df.sin_of_the_times.interpolate(method='index')
time
0.000000 0.000000
0.000000 0.000000
0.714286 0.601051
1.000000 0.841471
1.428571 0.870539
2.000000 0.909297
Name: sin_of_the_times, dtype: float64
这就是我的开始:df
time sin_of_the_times
0 0.000000 0.000000
0 0.000000 NaN
1 0.714286 NaN
1 1.000000 0.841471
2 1.428571 NaN
2 2.000000 0.909297
答案 1 :(得分:3)
从interp
开始构建速度numpy
np.interp(df['time'].values,
df.dropna()['time'].values,
df.dropna()['sin_of_the_times'].values)
Out[783]:
array([0. , 0. , 0.60105095, 0.841471 , 0.87053926,
0.909297 ])
#df['sin_of_the_times']= np.interp(df['time'].values,
# df.dropna()['time'].values,
# df.dropna()['sin_of_the_times'].values)