如何以秒为单位插值每小时数据?

时间:2019-05-09 22:51:05

标签: python pandas

我有以下熊猫数据框 df

    L_Time                U_Time                Eval_Time         L_Flux U_Flux
    2018-05-01 04:30:00   2018-05-01 05:30:00   2018-05-01 05:23:45   100   200
    2018-05-01 07:30:00   2018-05-01 08:30:00   2018-05-01 07:44:11   100   200    

L_Flux U_Flux 分别包含熊猫时间戳记 L_Time U_Time 的辐射通量值。我想在 Eval_Time 内插通量值(以秒为单位)。如何使用python或pandas正确执行。我试图用大熊猫对它进行插值并线性地进行scipy,但这总是给我带来中间价值(150)。我想根据距两个小时时间戳的距离在第二个时间戳( Eval_Time )内插通量。

2 个答案:

答案 0 :(得分:2)

您可以进行自己的插值,因为插值仅在两列之间。但是,您的数据似乎不正确,因为您要在第二行中进行推断。无论如何,以下都会给您答案

df = pd.DataFrame(data={'L_Time':['2018-05-01 04:30:00','2018-05-03 07:30:00'],
    'U_Time':['2018-05-01 05:30:00','2018-05-01 08:30:00'],
    'Eval_Time':['2018-05-01 05:23:45','2018-05-01 07:44:11'],
    'L_Flux':[ 100 ,100],
    'U_Flux':[200,200]})

df['L_Time'] = pd.to_datetime(df['L_Time'])
df['U_Time'] =  pd.to_datetime(df['U_Time'])
df['Eval_Time'] =  pd.to_datetime(df['Eval_Time'])

# The actual maths part - using times between U, L and Eval
df['Eval_Flux'] = df.L_Flux + (df.U_Flux - df.L_Flux)*(df.Eval_Time - df.L_Time)/(df.U_Time - df.L_Time)



               L_Time              U_Time          Eval_Time  L_Flux  U_Flux Eval_Flux
0 2018-05-01 04:30:00 2018-05-01 05:30:00 2018-05-01 05:23:45     100     200     189.583333   
1 2018-05-03 07:30:00 2018-05-01 08:30:00 2018-05-01 07:44:11     100     200     201.624704

答案 1 :(得分:0)

我需要以秒为单位(升采样)对 L_Time U_Time 之间的数据进行重新采样,然后对升采样的通量值(之前缺少的是NaN)进行插值并提取在 Eval_Time 内插的通量值。

INTERPOL_FLUX = []
for i in df.itertuples():
    df = pd.DataFrame( [(i[1],i[4]), (i[2],i[5])], columns = ['Times', 'Flux'] ) #Create a new dataframe with two Timestamps in a single row
    df = df.set_index('Times') #Set Timestamps as index of new dataframe
    df = pd.Series(df['Flux'], index = df.index)  #Squeeze dataframe to series
    interpolated  = df.resample('S').interpolate(method='linear') #Upsample data and interpolate (i needed linear ones)
    interpol_flux = interpolated.loc[ i[3] ] #Extract interpolated flux at Eval_Time
    INTERPOL_FLUX.append(interpol_flux) #Add this interpolated flux to an empty list

df['Eval_Flux'] = INTERPOL_FLUX  #Set this list as the Eval_Flux column

简而言之,

INTERPOL_FLUX = []
for i in df.itertuples():
    df = pd.DataFrame( [(i[1],i[4]), (i[2],i[5])], columns = ['Times', 'Flux'] ).set_index('Times')
    df = pd.Series(df['Flux'], index = df.index)
    INTERPOL_FLUX.append(df.resample('S').interpolate(method='linear').loc[i[3]]) 

df['Eval_Flux'] = INTERPOL_FLUX

我以为会很慢,但是很快。