Question

(not a duplicate question)

我有以下数据集：

GMT TIME, Value
2018-01-01 00:00:00,    1.2030   
2018-01-01 00:01:00,    1.2000 
2018-01-01 00:02:00,    1.2030   
2018-01-01 00:03:00,    1.2030   
.... , ....
2018-12-31 23:59:59,    1.2030

我正在尝试删除以下内容的方法：

hh:mm:ss从日期时间开始
在删除time (hh:mm:ss)部分之后，我们将有重复的date条目，例如多个2018-01-01，依此类推...因此我需要删除重复的日期数据，仅保留最后一个日期日期，在下一个日期之前，例如2018-01-02，并且类似地将下一个2018-01-02保留在下一个日期2018-01-03之前，然后重复...

如何使用Pandas来做到这一点？

Answer 1

假设您有数据：

              GMT TIME  Value
0  2018-01-01 00:00:00  1.203
1  2018-01-01 00:01:00  1.200
2  2018-01-01 00:02:00  1.203
3  2018-01-01 00:03:00  1.203
4  2018-01-02 00:03:00  1.203
5  2018-01-03 00:03:00  1.203
6  2018-01-04 00:03:00  1.203
7  2018-12-31 23:59:59  1.203

将pandas.to_datetime.dt.date与pandas.DataFrame.groupby一起使用：

import pandas as pd

df['GMT TIME'] = pd.to_datetime(df['GMT TIME']).dt.date
df.groupby(df['GMT TIME']).last()

输出：

            Value
GMT TIME         
2018-01-01  1.203
2018-01-02  1.203
2018-01-03  1.203
2018-01-04  1.203
2018-12-31  1.203

或使用pandas.DataFrame.drop_duplicates：

df['GMT TIME'] = pd.to_datetime(df['GMT TIME']).dt.date
df.drop_duplicates('GMT TIME', 'last')

输出：

     GMT TIME  Value
3  2018-01-01  1.203
4  2018-01-02  1.203
5  2018-01-03  1.203
6  2018-01-04  1.203
7  2018-12-31  1.203

Answer 2

使用duplicated

#df['GMT TIME'] = pd.to_datetime(df['GMT TIME']).dt.date

df[~df['GMT TIME'].dt.date.iloc[::-1].duplicated()]\

或使用

df.groupby(df['GMT TIME'].dt.date).tail(1)

熊猫：删除重复的日期，但保留最后一个

2 个答案: