我有以下熊猫数据框 df :
L_Time U_Time Eval_Time L_Flux U_Flux
2018-05-01 04:30:00 2018-05-01 05:30:00 2018-05-01 05:23:45 100 200
2018-05-01 07:30:00 2018-05-01 08:30:00 2018-05-01 07:44:11 100 200
L_Flux 和 U_Flux 分别包含熊猫时间戳记 L_Time 和 U_Time 的辐射通量值。我想在 Eval_Time 内插通量值(以秒为单位)。如何使用python或pandas正确执行。我试图用大熊猫对它进行插值并线性地进行scipy,但这总是给我带来中间价值(150)。我想根据距两个小时时间戳的距离在第二个时间戳( Eval_Time )内插通量。
答案 0 :(得分:2)
您可以进行自己的插值,因为插值仅在两列之间。但是,您的数据似乎不正确,因为您要在第二行中进行推断。无论如何,以下都会给您答案
df = pd.DataFrame(data={'L_Time':['2018-05-01 04:30:00','2018-05-03 07:30:00'],
'U_Time':['2018-05-01 05:30:00','2018-05-01 08:30:00'],
'Eval_Time':['2018-05-01 05:23:45','2018-05-01 07:44:11'],
'L_Flux':[ 100 ,100],
'U_Flux':[200,200]})
df['L_Time'] = pd.to_datetime(df['L_Time'])
df['U_Time'] = pd.to_datetime(df['U_Time'])
df['Eval_Time'] = pd.to_datetime(df['Eval_Time'])
# The actual maths part - using times between U, L and Eval
df['Eval_Flux'] = df.L_Flux + (df.U_Flux - df.L_Flux)*(df.Eval_Time - df.L_Time)/(df.U_Time - df.L_Time)
L_Time U_Time Eval_Time L_Flux U_Flux Eval_Flux
0 2018-05-01 04:30:00 2018-05-01 05:30:00 2018-05-01 05:23:45 100 200 189.583333
1 2018-05-03 07:30:00 2018-05-01 08:30:00 2018-05-01 07:44:11 100 200 201.624704
答案 1 :(得分:0)
我需要以秒为单位(升采样)对 L_Time 和 U_Time 之间的数据进行重新采样,然后对升采样的通量值(之前缺少的是NaN)进行插值并提取在 Eval_Time 内插的通量值。
INTERPOL_FLUX = []
for i in df.itertuples():
df = pd.DataFrame( [(i[1],i[4]), (i[2],i[5])], columns = ['Times', 'Flux'] ) #Create a new dataframe with two Timestamps in a single row
df = df.set_index('Times') #Set Timestamps as index of new dataframe
df = pd.Series(df['Flux'], index = df.index) #Squeeze dataframe to series
interpolated = df.resample('S').interpolate(method='linear') #Upsample data and interpolate (i needed linear ones)
interpol_flux = interpolated.loc[ i[3] ] #Extract interpolated flux at Eval_Time
INTERPOL_FLUX.append(interpol_flux) #Add this interpolated flux to an empty list
df['Eval_Flux'] = INTERPOL_FLUX #Set this list as the Eval_Flux column
简而言之,
INTERPOL_FLUX = []
for i in df.itertuples():
df = pd.DataFrame( [(i[1],i[4]), (i[2],i[5])], columns = ['Times', 'Flux'] ).set_index('Times')
df = pd.Series(df['Flux'], index = df.index)
INTERPOL_FLUX.append(df.resample('S').interpolate(method='linear').loc[i[3]])
df['Eval_Flux'] = INTERPOL_FLUX
我以为会很慢,但是很快。