删除Pandas中的准重复项

时间:2016-10-28 00:45:18

标签: python pandas

我有一个Pandas数据框,如下所示:

import pandas as pd
data = pd.read_csv('C:\Users\Frank\Desktop\\10-25-16-54-7-IMPORT.csv', index_col=False)
print data.head(10)

   Date                                 Symbol   
0  2015-03-18 01:54:35 UTC              NKTR             -0.290   
1  2015-03-18 02:10:49 UTC               DRQ             -0.082   
2  2015-03-18 03:03:10 UTC              NKTR             -0.290   
3  2015-03-18 03:13:17 UTC               UAM              0.414   
4  2015-03-18 03:48:24 UTC              ROCK              0.000   
5  2015-03-18 03:56:30 UTC              ROCK              0.000   
6  2015-03-18 04:52:24 UTC               MTZ             -0.290   
7  2015-03-18 05:00:29 UTC              NKTR             -0.290   
8  2015-03-18 05:04:31 UTC              NKTR             -0.290   
9  2015-03-18 05:29:48 UTC              PSEC             -0.046 

我想删除在同一天同一个符号的第一个实例之后发生的带有重复符号的每一行(在本例中为#34; NKTR")。这可能吗?

(由于行的时间戳不同,删除重复项将无效。)

1 个答案:

答案 0 :(得分:1)

您可以尝试groupby() Date列的日期和Symbol然后获取每个组的第一行:

import pandas as pd
df.groupby([pd.to_datetime(df.Date).dt.date, 'Symbol'], as_index=False).first()

#  Symbol                      Date  Value
#0    DRQ   2015-03-18 02:10:49 UTC -0.082
#1    MTZ   2015-03-18 04:52:24 UTC -0.290
#2   NKTR   2015-03-18 01:54:35 UTC -0.290
#3   PSEC   2015-03-18 05:29:48 UTC -0.046
#4   ROCK   2015-03-18 03:48:24 UTC  0.000
#5    UAM   2015-03-18 03:13:17 UTC  0.414