Question

我有一个6列的pandas数据框，其中两个是“ date”和“ time”。对于每个日期，我只想保留具有最大时间值的行。例如，下面的日期在左边，时间在右边。我只想保留时间为1925年的所有行。

20200109    1925
20200109    1925
20200109    1925
20200109    1925
20200109    1925
20200109    1925
20200109    1830
20200109    1830
20200109    1830
20200109    1830
20200109    1830

我尝试了许多解决方案，并使用groupby进行排序，例如dataframe.groupby('date').apply(lambda x: x.loc[x.time == x.time.max(),['date','time']])

，但这仅返回日期和时间列。我想要结果中的所有6列

编辑：我想保留所有与最长时间相关的日期。

Answer 1

尝试类似的方法-

dates = [20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200110]
times = [1925, 1925, 1925, 1925, 1925, 1925, 1830, 1830, 1830, 1830, 1830, 1930]
df = pd.DataFrame({'dates':dates, 'times':times})

filt = df.groupby(['dates'])['times'].max().to_frame().reset_index()
final = pd.merge(df,filt,on=['dates','times'])

final 
      dates  times
  0  20200109   1925
  1  20200109   1925
  2  20200109   1925
  3  20200109   1925
  4  20200109   1925
  5  20200109   1925
  6  20200110   1930

我认为，即使您向df添加更多列，final也会有您想要的额外列。

（熊猫）按一列分组，仅保留另一列最大的行

1 个答案: