I have a dataframe with some columns that i have been adding myself. There is one specific column that gathers the max and min tide levels.
Pandas Column mostly empty but with some reference values
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3,4],'b':[np.nan,np.nan,3,4]},columns=['a','b'])
df
The problem is that the column is mostly empty because it only shows those peak values and not the intermediate ones. I would like to fill the missing values with a function similiar to the image shown below.
I want to fill it with a function of this kind
Thank you in advance.
答案 0 :(得分:0)
由于您没有指定您的pandas数据帧使用的日期时间格式,下面是索引数据的示例。你可以使用它们,如果它们间隔均匀并且没有间隙。
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
tide = np.asarray([-1.2,np.nan,np.nan,3.4,np.nan,np.nan,-1.6,np.nan,np.nan,3.7,np.nan,np.nan,-1.4,])
tide_time = np.arange(len(tide))
df = pd.DataFrame({'a':tide_time,'b':tide})
#define your fit function with amplitude, frequence, phase and offset
def fit_func(x, ampl, freq, phase, offset):
return ampl * np.sin(freq * x + phase) + offset
#extract rows that contain your values
df_nona = df.dropna()
#perform the least square fit, get the coefficients for your fitted data
coeff, _mat = curve_fit(fit_func, df_nona["a"], df_nona["b"])
print(coeff)
#append a column with fit data
df["fitted_b"] = fit_func(df["a"], *coeff)
我的样本数据的输出
#amplitude frequency phase offset
[ 2.63098177 1.12805625 -2.17037976 1.0127173 ]
a b fitted_b
0 0 -1.2 -1.159344
1 1 NaN -1.259341
2 2 NaN 1.238002
3 3 3.4 3.477807
4 4 NaN 2.899605
5 5 NaN 0.164376
6 6 -1.6 -1.601058
7 7 NaN -0.378513
8 8 NaN 2.434439
9 9 3.7 3.622127
10 10 NaN 1.826826
11 11 NaN -0.899136
12 12 -1.4 -1.439532