基于列中值的熊猫分组时间

时间:2020-03-26 17:01:53

标签: python-3.x pandas

我有一个像这样的数据框

data = {'lat':[4.2471, 4.2646,4.2945, 4.2819,4.2635,4.2616,4.2731,4.2555,5.2555],
        'lng':[-76.7504,-76.7198,-76.7069,-76.7251,-76.726,-76.7196,-76.715,-76.7118,-77.7118],
       'x':[208.999,-894.0,-171.0,108.999,-162.0,-29.0,-143.999,-133.0,-900.0],
       'e':[0.105,0.209,0.934,0.150,0.158,0.347,0.333,0.089,0.189],
       'dep':['a','a','a','b','b','b','c','c','c']}

df = pd.DataFrame(data)
df = pd.DataFrame(data, index =['2020-01-01 16:32:14.105000-05:00', '2020-01-01 16:32:14.112000-05:00',
                                '2020-01-01 16:32:14.175000-05:00', '2020-01-01 16:32:14.176000-05:00',
                                '2020-01-01 16:32:14.211000-05:00','2020-01-01 16:32:14.220000-05:00',
                               '2020-01-01 16:32:14.310000-05:00','2020-01-01 16:32:14.327000-05:00',
                               '2020-01-01 16:32:15.327000-05:00'])
df.index = pd.to_datetime(df.index)

这个想法是过滤'dep'列中的值,然后对同一秒内出现的行进行分组,并获得另一列中具有最大值的行,我只是针对一个值执行此操作,但我需要针对大型数据框执行此操作。

这是我到目前为止所拥有的:

df['x_ABS']=df['x'].abs()
d=df[(df['dep']=='a')]
idx = d.groupby([d.index.floor('s')])['x_ABS'].transform(max) == d['x_ABS']
d[idx]

1 个答案:

答案 0 :(得分:0)

最好使用resample来获得每秒x的最大值。这是示例代码(请注意,我制作了一个名为df_的副本):

df_ = df[df['dep']=='a'].copy()
df_['x'] = df_['x'].abs()
df_.index = df_.index.floor('s')
df_['x'] = df_.resample('s').transform(max)['x']

输出:

                               lat      lng     x         e     dep
2020-01-01 16:32:14-05:00   4.2471  -76.7504    894.0   0.105   a
2020-01-01 16:32:14-05:00   4.2646  -76.7198    894.0   0.209   a
2020-01-01 16:32:14-05:00   4.2945  -76.7069    894.0   0.934   a