我有一个像这样的数据框
data = {'lat':[4.2471, 4.2646,4.2945, 4.2819,4.2635,4.2616,4.2731,4.2555,5.2555],
'lng':[-76.7504,-76.7198,-76.7069,-76.7251,-76.726,-76.7196,-76.715,-76.7118,-77.7118],
'x':[208.999,-894.0,-171.0,108.999,-162.0,-29.0,-143.999,-133.0,-900.0],
'e':[0.105,0.209,0.934,0.150,0.158,0.347,0.333,0.089,0.189],
'dep':['a','a','a','b','b','b','c','c','c']}
df = pd.DataFrame(data)
df = pd.DataFrame(data, index =['2020-01-01 16:32:14.105000-05:00', '2020-01-01 16:32:14.112000-05:00',
'2020-01-01 16:32:14.175000-05:00', '2020-01-01 16:32:14.176000-05:00',
'2020-01-01 16:32:14.211000-05:00','2020-01-01 16:32:14.220000-05:00',
'2020-01-01 16:32:14.310000-05:00','2020-01-01 16:32:14.327000-05:00',
'2020-01-01 16:32:15.327000-05:00'])
df.index = pd.to_datetime(df.index)
这个想法是过滤'dep'列中的值,然后对同一秒内出现的行进行分组,并获得另一列中具有最大值的行,我只是针对一个值执行此操作,但我需要针对大型数据框执行此操作。
这是我到目前为止所拥有的:
df['x_ABS']=df['x'].abs()
d=df[(df['dep']=='a')]
idx = d.groupby([d.index.floor('s')])['x_ABS'].transform(max) == d['x_ABS']
d[idx]
答案 0 :(得分:0)
最好使用resample
来获得每秒x
的最大值。这是示例代码(请注意,我制作了一个名为df_
的副本):
df_ = df[df['dep']=='a'].copy()
df_['x'] = df_['x'].abs()
df_.index = df_.index.floor('s')
df_['x'] = df_.resample('s').transform(max)['x']
输出:
lat lng x e dep
2020-01-01 16:32:14-05:00 4.2471 -76.7504 894.0 0.105 a
2020-01-01 16:32:14-05:00 4.2646 -76.7198 894.0 0.209 a
2020-01-01 16:32:14-05:00 4.2945 -76.7069 894.0 0.934 a