熊猫日期时间索引删除重复项使行具有特定的列最大值

时间:2020-09-22 16:02:57

标签: python pandas datetime

我有一个看起来像这样的数据框:

Time                                val1    val2
2020-09-21 00:33:29.226000-05:00    0.115   98.5
2020-09-21 01:56:49.225000-05:00    0.557   141.9
**2020-09-21 02:46:05.659000-05:00  0.046   39.4**
**2020-09-21 02:46:05.659000-05:00  0.174   305.2**
2020-09-21 03:45:19.899000-05:00    0.118   161.1
2020-09-21 04:33:25.532000-05:00    0.145   182.6
2020-09-21 05:09:12.862000-05:00    0.343   139.5
2020-09-21 06:07:50.445000-05:00    2.036   44.7
**2020-09-20 07:59:30.475000-05:00  0.082   10.8**
**2020-09-20 07:59:30.475000-05:00  0.092   19**
2020-09-20 08:01:51.487000-05:00    0.083   18.8
2020-09-20 09:56:00.108000-05:00    1.058   9.5
2020-09-20 11:21:26.805000-05:00    0.514   9
2020-09-20 12:28:08.667000-05:00    0.242   16.2
2020-09-20 13:29:31.026000-05:00    0.115   56.8
2020-09-20 14:04:17.509000-05:00    0.067   135.9
2020-09-20 15:59:42.175000-05:00    0.153   169.3
2020-09-20 16:11:05.711000-05:00    0.128   107
2020-09-20 17:24:43.678000-05:00    0.157   122.1
2020-09-20 18:02:01.091000-05:00    0.152   103.6
2020-09-20 19:32:09.288000-05:00    0.164   118
2020-09-20 20:50:39.238000-05:00    0.106   120.5
2020-09-20 21:04:13.440000-05:00    0.125   133.4
2020-09-20 22:57:49.545000-05:00    0.206   94.1
2020-09-20 23:54:57.790000-05:00    0.16    95.5

此数据框是使用此行代码df2 = df.loc[df.groupby(df.index.hour)['val2'].idxmax()]由较大的数据框创建的,问题是它保留了一些重复的时间2020-09-21 02:46:05.659000-05:002020-09-20 07:59:30.475000-05:00,我想保留行val2列的最大值具有以下结果:

Time                                val1    val2
2020-09-21 00:33:29.226000-05:00    0.115   98.5
2020-09-21 01:56:49.225000-05:00    0.557   141.9
**2020-09-21 02:46:05.659000-05:00  0.174   305.2**
2020-09-21 03:45:19.899000-05:00    0.118   161.1
2020-09-21 04:33:25.532000-05:00    0.145   182.6
2020-09-21 05:09:12.862000-05:00    0.343   139.5
2020-09-21 06:07:50.445000-05:00    2.036   44.7
**2020-09-20 07:59:30.475000-05:00  0.092   19**
2020-09-20 08:01:51.487000-05:00    0.083   18.8
2020-09-20 09:56:00.108000-05:00    1.058   9.5
2020-09-20 11:21:26.805000-05:00    0.514   9
2020-09-20 12:28:08.667000-05:00    0.242   16.2
2020-09-20 13:29:31.026000-05:00    0.115   56.8
2020-09-20 14:04:17.509000-05:00    0.067   135.9
2020-09-20 15:59:42.175000-05:00    0.153   169.3
2020-09-20 16:11:05.711000-05:00    0.128   107
2020-09-20 17:24:43.678000-05:00    0.157   122.1
2020-09-20 18:02:01.091000-05:00    0.152   103.6
2020-09-20 19:32:09.288000-05:00    0.164   118
2020-09-20 20:50:39.238000-05:00    0.106   120.5
2020-09-20 21:04:13.440000-05:00    0.125   133.4
2020-09-20 22:57:49.545000-05:00    0.206   94.1
2020-09-20 23:54:57.790000-05:00    0.16    95.5

我尝试使用df2 [〜df2.index.duplicated()],但它会删除第一行或最后一行,而不是值最低的行

0 个答案:

没有答案