我有一个具有天数和值的数据框,如下所示,我想使用指数函数将数据推断到大约5000天之前,但是我认为我在处理天数推断时做得不正确只是增加了100天的时间间隔,而不是时间序列形式。我应该在“天数”列中指定实际日期吗?
Days Col1 Col2 105 1.042990717 0.977126509 131 1.032115123 0.965084949 155 1.027434996 0.959954413 181 1.021729519 0.955389315 209 1.015965345 0.951295069 236 1.009429161 0.943803697 258 1.004876875 0.940235463 285 1.000358928 0.931895737 315 0.995125739 0.926641403 363 0.990417213 0.920608379 387 0.986946909 0.915730933
到目前为止,我使用的代码
#add days
df = df.reindex(df.index.tolist()+list(range(387,6000,100)))
# Initial parameter guess, just to kick off the optimization
init = (0.01, 0.01)
# Place to store function parameters for each column
col_params = {}
def func(x, a, b):
return a*np.exp(-b*x)
# Curve fit each column
for col in df.columns:
# Get x & y
x = df.index.astype(float).values
y = df[col].values
if np.isnan(y).any():
x = df[col].dropna()
x = x.index.astype(float).values
y = df[col].dropna()
y = y.values
params = curve_fit(func,x, y, init)
# Store optimized parameters
col_params[col] = params[0]
else:
# Curve fit column and get curve parameters
params = curve_fit(func, x, y, init)
# Store optimized parameters
col_params[col] = params[0]
print(col_params[col])
# Extrapolate each column
for col in df.columns:
# Get the index values for NaNsY in the column
x = df[pd.isnull(df[col])].index.astype(float).values
x1 = df.index.astype(float).values
y = df[col].values
# Extrapolate those points with the fitted function
df[col][x] = func(x, *col_params[col])
plt.plot(x1,y,func(x, *col_params[col]),'g--')
# Display result
print ('Extrapolated data:')
print (df)
数据确实可以推断,但我认为我做得不好,尤其是在处理天列方面。