我有一个Pandas数据框,如下所示:
import pandas as pd
data = pd.read_csv('C:\Users\Frank\Desktop\\10-25-16-54-7-IMPORT.csv', index_col=False)
print data.head(10)
Date Symbol
0 2015-03-18 01:54:35 UTC NKTR -0.290
1 2015-03-18 02:10:49 UTC DRQ -0.082
2 2015-03-18 03:03:10 UTC NKTR -0.290
3 2015-03-18 03:13:17 UTC UAM 0.414
4 2015-03-18 03:48:24 UTC ROCK 0.000
5 2015-03-18 03:56:30 UTC ROCK 0.000
6 2015-03-18 04:52:24 UTC MTZ -0.290
7 2015-03-18 05:00:29 UTC NKTR -0.290
8 2015-03-18 05:04:31 UTC NKTR -0.290
9 2015-03-18 05:29:48 UTC PSEC -0.046
我想删除在同一天同一个符号的第一个实例之后发生的带有重复符号的每一行(在本例中为#34; NKTR")。这可能吗?
(由于行的时间戳不同,删除重复项将无效。)
答案 0 :(得分:1)
您可以尝试groupby()
Date
列的日期和Symbol
然后获取每个组的第一行:
import pandas as pd
df.groupby([pd.to_datetime(df.Date).dt.date, 'Symbol'], as_index=False).first()
# Symbol Date Value
#0 DRQ 2015-03-18 02:10:49 UTC -0.082
#1 MTZ 2015-03-18 04:52:24 UTC -0.290
#2 NKTR 2015-03-18 01:54:35 UTC -0.290
#3 PSEC 2015-03-18 05:29:48 UTC -0.046
#4 ROCK 2015-03-18 03:48:24 UTC 0.000
#5 UAM 2015-03-18 03:13:17 UTC 0.414