How can I fill pandas column with a function (sin,line)?

时间:2018-02-03 09:52:32

标签: python pandas numpy dataframe data-science

I have a dataframe with some columns that i have been adding myself. There is one specific column that gathers the max and min tide levels.

Pandas Column mostly empty but with some reference values

import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3,4],'b':[np.nan,np.nan,3,4]},columns=['a','b']) 
df

The problem is that the column is mostly empty because it only shows those peak values and not the intermediate ones. I would like to fill the missing values with a function similiar to the image shown below.

I want to fill it with a function of this kind

Thank you in advance.

1 个答案:

答案 0 :(得分:0)

由于您没有指定您的pandas数据帧使用的日期时间格式,下面是索引数据的示例。你可以使用它们,如果它们间隔均匀并且没有间隙。

import pandas as pd
import numpy as np
from scipy.optimize import curve_fit

tide = np.asarray([-1.2,np.nan,np.nan,3.4,np.nan,np.nan,-1.6,np.nan,np.nan,3.7,np.nan,np.nan,-1.4,])
tide_time = np.arange(len(tide))
df = pd.DataFrame({'a':tide_time,'b':tide}) 

#define your fit function with amplitude, frequence, phase and offset
def fit_func(x, ampl, freq, phase, offset):
    return ampl * np.sin(freq * x + phase) + offset

#extract rows that contain your values
df_nona = df.dropna()

#perform the least square fit, get the coefficients for your fitted data
coeff, _mat = curve_fit(fit_func, df_nona["a"], df_nona["b"])
print(coeff)

#append a column with fit data
df["fitted_b"] = fit_func(df["a"], *coeff)

我的样本数据的输出

#amplitude    frequency   phase       offset
[ 2.63098177  1.12805625 -2.17037976  1.0127173 ]

     a    b  fitted_b
0    0 -1.2 -1.159344
1    1  NaN -1.259341
2    2  NaN  1.238002
3    3  3.4  3.477807
4    4  NaN  2.899605
5    5  NaN  0.164376
6    6 -1.6 -1.601058
7    7  NaN -0.378513
8    8  NaN  2.434439
9    9  3.7  3.622127
10  10  NaN  1.826826
11  11  NaN -0.899136
12  12 -1.4 -1.439532