大熊猫-将频率较高的数据帧下采样为频率较低的数据帧

时间:2019-07-15 16:19:23

标签: python pandas dataframe

我有两个DataFrame,它们具有在不同频率下测量的不同数据,如那些csv示例中一样:

df1:

i,m1,m2,t
0,0.556529,6.863255,43564.844
1,0.5565576199999884,6.86327749999999,43564.863999999994
2,0.5565559400000003,6.8632764,43564.884
3,0.5565699799999941,6.863286799999996,43564.903999999995
4,0.5565570200000007,6.863277200000001,43564.924
5,0.5565316400000097,6.863257100000007,43564.944
...

df2:

i,m3,m4,t
0,306.81162500000596,-1.2126870045404683,43564.878125
1,306.86175000000725,-1.1705838272666433,43564.928250000004
2,306.77552454544787,-1.1240195386446195,43564.97837499999
3,306.85900545454086,-1.0210345363692084,43565.0285
4,306.8354250000052,-1.0052431772666657,43565.078625
5,306.88397499999286,-0.9468344809917896,43565.12875
...

我想获得一个df,该df在第一个df时具有两个df的所有测量值(它们获取数据的频率较低)。

我尝试使用for循环对df1的两个时间戳之间的df2量度进行平均,但它非常慢

1 个答案:

答案 0 :(得分:1)

IIUC,i是索引列,您想将df2['t']放在bin中并对其他列取平均值。因此,您可以使用pd.cut

groups =pd.cut(df2.t, bins= list(df1.t) + [np.inf],
               right=False,
               labels=df1['t'])

# cols to copy
cols = [col for col in df2.columns if col != 't']

# groupby and get the average
new_df = (df2[cols].groupby(groups)
                   .mean()
                   .reset_index()
         )

然后new_df是:

           t          m3        m4
0  43564.844         NaN       NaN
1  43564.864  306.811625 -1.212687
2  43564.884         NaN       NaN
3  43564.904         NaN       NaN
4  43564.924  306.861750 -1.170584
5  43564.944  306.838482 -1.024283

您可以与df1上的t合并:

df1.merge(new_df, on='t', how='left')

并获得:

         m1        m2        t          m3        m4
0  0.556529  6.863255  43564.8         NaN       NaN
1  0.556558  6.863277  43564.9  306.811625 -1.212687
2  0.556556  6.863276  43564.9         NaN       NaN
3  0.556570  6.863287  43564.9         NaN       NaN
4  0.556557  6.863277  43564.9  306.861750 -1.170584
5  0.556532  6.863257  43564.9  306.838482 -1.024283