我有一个看起来像这样的数据框:
Time val1 val2
2020-09-21 00:33:29.226000-05:00 0.115 98.5
2020-09-21 01:56:49.225000-05:00 0.557 141.9
**2020-09-21 02:46:05.659000-05:00 0.046 39.4**
**2020-09-21 02:46:05.659000-05:00 0.174 305.2**
2020-09-21 03:45:19.899000-05:00 0.118 161.1
2020-09-21 04:33:25.532000-05:00 0.145 182.6
2020-09-21 05:09:12.862000-05:00 0.343 139.5
2020-09-21 06:07:50.445000-05:00 2.036 44.7
**2020-09-20 07:59:30.475000-05:00 0.082 10.8**
**2020-09-20 07:59:30.475000-05:00 0.092 19**
2020-09-20 08:01:51.487000-05:00 0.083 18.8
2020-09-20 09:56:00.108000-05:00 1.058 9.5
2020-09-20 11:21:26.805000-05:00 0.514 9
2020-09-20 12:28:08.667000-05:00 0.242 16.2
2020-09-20 13:29:31.026000-05:00 0.115 56.8
2020-09-20 14:04:17.509000-05:00 0.067 135.9
2020-09-20 15:59:42.175000-05:00 0.153 169.3
2020-09-20 16:11:05.711000-05:00 0.128 107
2020-09-20 17:24:43.678000-05:00 0.157 122.1
2020-09-20 18:02:01.091000-05:00 0.152 103.6
2020-09-20 19:32:09.288000-05:00 0.164 118
2020-09-20 20:50:39.238000-05:00 0.106 120.5
2020-09-20 21:04:13.440000-05:00 0.125 133.4
2020-09-20 22:57:49.545000-05:00 0.206 94.1
2020-09-20 23:54:57.790000-05:00 0.16 95.5
此数据框是使用此行代码df2 = df.loc[df.groupby(df.index.hour)['val2'].idxmax()]
由较大的数据框创建的,问题是它保留了一些重复的时间2020-09-21 02:46:05.659000-05:00
和2020-09-20 07:59:30.475000-05:00
,我想保留行val2列的最大值具有以下结果:
Time val1 val2
2020-09-21 00:33:29.226000-05:00 0.115 98.5
2020-09-21 01:56:49.225000-05:00 0.557 141.9
**2020-09-21 02:46:05.659000-05:00 0.174 305.2**
2020-09-21 03:45:19.899000-05:00 0.118 161.1
2020-09-21 04:33:25.532000-05:00 0.145 182.6
2020-09-21 05:09:12.862000-05:00 0.343 139.5
2020-09-21 06:07:50.445000-05:00 2.036 44.7
**2020-09-20 07:59:30.475000-05:00 0.092 19**
2020-09-20 08:01:51.487000-05:00 0.083 18.8
2020-09-20 09:56:00.108000-05:00 1.058 9.5
2020-09-20 11:21:26.805000-05:00 0.514 9
2020-09-20 12:28:08.667000-05:00 0.242 16.2
2020-09-20 13:29:31.026000-05:00 0.115 56.8
2020-09-20 14:04:17.509000-05:00 0.067 135.9
2020-09-20 15:59:42.175000-05:00 0.153 169.3
2020-09-20 16:11:05.711000-05:00 0.128 107
2020-09-20 17:24:43.678000-05:00 0.157 122.1
2020-09-20 18:02:01.091000-05:00 0.152 103.6
2020-09-20 19:32:09.288000-05:00 0.164 118
2020-09-20 20:50:39.238000-05:00 0.106 120.5
2020-09-20 21:04:13.440000-05:00 0.125 133.4
2020-09-20 22:57:49.545000-05:00 0.206 94.1
2020-09-20 23:54:57.790000-05:00 0.16 95.5
我尝试使用df2 [〜df2.index.duplicated()],但它会删除第一行或最后一行,而不是值最低的行