熊猫groupby并获得具有最大值的行

时间:2020-02-13 20:14:24

标签: python pandas pandas-groupby

我有一个带有日期时间索引的pandas数据帧,我想按秒分组,结果是在'a_ABS'列中具有最大值的行,但我仅能获得每列的最大值。

import pandas as pd

data = {'lat':[4.2471, 4.2646,4.2945, 4.2819,4.2635,4.2616,4.2731,4.2555],
        'lng':[-76.7504,-76.7198,-76.7069,-76.7251,-76.726,-76.7196,-76.715,-767.118],
       'a':[208.999,-894.0,-171.0,108.999,-162.0,-29.0,-143.999,-133.0],
       'e':[0.105,0.209,0.934,0.150,0.158,0.347,0.333,0.089]}

df = pd.DataFrame(data)
df = pd.DataFrame(data, index =['2020-01-01 16:32:14.105000-05:00', '2020-01-01 16:32:14.112000-05:00',
                                '2020-01-01 16:32:14.175000-05:00', '2020-01-01 16:32:14.176000-05:00',
                                '2020-01-01 16:32:14.211000-05:00','2020-01-01 16:32:14.220000-05:00',
                               '2020-01-01 16:32:14.310000-05:00','2020-01-01 16:32:14.327000-05:00'])
df.index = pd.to_datetime(df.index)


a=df
a['a_ABS']=a['a'].abs()
aa=a.groupby([a.index.floor('s')], as_index=True).max()

complete dataframe

dataframe with max value per colum

2 个答案:

答案 0 :(得分:1)

您快到了。使用a.iloc[:1]排序后,选择第一行。完整代码:

import pandas as pd

data = {'lat':[4.2471, 4.2646,4.2945, 4.2819,4.2635,4.2616,4.2731,4.2555],
        'lng':[-76.7504,-76.7198,-76.7069,-76.7251,-76.726,-76.7196,-76.715,-767.118],
       'a':[208.999,-894.0,-171.0,108.999,-162.0,-29.0,-143.999,-133.0],
       'e':[0.105,0.209,0.934,0.150,0.158,0.347,0.333,0.089]}

df = pd.DataFrame(data)
df = pd.DataFrame(data, index =['2020-01-01 16:32:14.105000-05:00', '2020-01-01 16:32:14.112000-05:00',
                                '2020-01-01 16:32:14.175000-05:00', '2020-01-01 16:32:14.176000-05:00',
                                '2020-01-01 16:32:14.211000-05:00','2020-01-01 16:32:14.220000-05:00',
                               '2020-01-01 16:32:14.310000-05:00','2020-01-01 16:32:14.327000-05:00'])
df.index = pd.to_datetime(df.index)


a=df
a['a_ABS']=a['a'].abs()

a=a.sort_values(by="a_ABS", ascending=False)
first_df=a.iloc[:1]

print(first_df)

答案 1 :(得分:0)

类似的事情会起作用:

import pandas as pd

# create dataframe:
df = pd.DataFrame({
    'lat':[4.2471, 4.2646,4.2945, 4.2819,4.2635,4.2616,4.2731,4.2555],
    'lng':[-76.7504,-76.7198,-76.7069,-76.7251,-76.726,-76.7196,-76.715,-767.118],
    'a':[208.999,-894.0,-171.0,108.999,-162.0,-29.0,-143.999,-133.0],
    'e':[0.105,0.209,0.934,0.150,0.158,0.347,0.333,0.089]
})

# set index:
df.index = pd.to_datetime([
    '2020-01-01 16:32:14.105000-05:00', '2020-01-01 16:32:14.112000-05:00',
    '2020-01-01 16:32:14.175000-05:00', '2020-01-01 16:32:14.176000-05:00',
    '2020-01-01 16:32:14.211000-05:00', '2020-01-01 16:32:15.220000-05:00',
    '2020-01-01 16:32:14.310000-05:00', '2020-01-01 16:32:15.327000-05:00',
])

# create absolute column:
df['a_ABS'] = df['a'].abs()

# create seconds column:
df['seconds'] = df.index.second

# group columns by seconds:
df_grouped = df.groupby(['seconds']).max()

# extract only the 'a_ABS' column:
df_grouped = df_grouped['a_ABS']

# reset index:
df_grouped = df_grouped.reset_index()