Question

我有一个具有天数和值的数据框，如下所示，我想使用指数函数将数据推断到大约5000天之前，但是我认为我在处理天数推断时做得不正确只是增加了100天的时间间隔，而不是时间序列形式。我应该在“天数”列中指定实际日期吗？

Days    Col1    Col2
105 1.042990717 0.977126509
131 1.032115123 0.965084949
155 1.027434996 0.959954413
181 1.021729519 0.955389315
209 1.015965345 0.951295069
236 1.009429161 0.943803697
258 1.004876875 0.940235463
285 1.000358928 0.931895737
315 0.995125739 0.926641403
363 0.990417213 0.920608379
387 0.986946909 0.915730933

到目前为止，我使用的代码

#add days
df = df.reindex(df.index.tolist()+list(range(387,6000,100)))

# Initial parameter guess, just to kick off the optimization
init = (0.01, 0.01)

# Place to store function parameters for each column
col_params = {}

def func(x, a, b):
    return a*np.exp(-b*x)

# Curve fit each column
for col in df.columns:
    # Get x & y
    x = df.index.astype(float).values
    y = df[col].values


    if np.isnan(y).any():
        x = df[col].dropna()
        x = x.index.astype(float).values
        y = df[col].dropna()
        y = y.values

        params = curve_fit(func,x, y, init)
        # Store optimized parameters
        col_params[col] = params[0]
    else:
        # Curve fit column and get curve parameters
        params = curve_fit(func, x, y, init)
        # Store optimized parameters
        col_params[col] = params[0]
        print(col_params[col])

# Extrapolate each column
for col in df.columns:
    # Get the index values for NaNsY in the column
    x = df[pd.isnull(df[col])].index.astype(float).values
    x1 = df.index.astype(float).values
    y = df[col].values
    # Extrapolate those points with the fitted function
    df[col][x] = func(x, *col_params[col])

    plt.plot(x1,y,func(x, *col_params[col]),'g--')

# Display result
print ('Extrapolated data:')
print (df)

数据确实可以推断，但我认为我做得不好，尤其是在处理天列方面。

如何使用指数函数外推时间序列数据？

0 个答案: