我有一个数据框z,其中一列是日期时间格式:
index time
0 2017-03-01 09:30:00.233
1 2017-03-01 09:30:00.243
2 2017-03-01 09:30:00.319
3 2017-03-01 09:30:00.981
4 2017-03-01 09:30:02.555
5 2017-03-01 09:30:02.959
6 2017-03-01 09:30:03.908
7 2017-03-01 09:30:12.659
8 2017-03-01 09:30:19.006
9 2017-03-01 09:30:22.990
10 2017-03-01 09:30:23.166
11 2017-03-01 09:30:27.879
12 2017-03-01 09:30:28.370
基本上,我想删除数据中特定时间之前的行。例如,假设在这种情况下我想在09:30:22.990(第9行)之前删除所有行。我所拥有的是:
first_trade_AM = z['Time'][z['Time'].dt.day == date][z['EventType'] == 'trade'].head(1)
for simplicity's sake, lets say this formula returns
9 2017-03-01 09:30:22.990
then I have
z.drop(z['Time'][z['Time'].dt.day == date] < first_trade_AM)
但我收到错误消息:
ValueError: Can only compare identically-labeled Series objects
任何帮助将不胜感激。被困在这里一段时间了。谢谢!
编辑:我上面发布的数据集是整个月数据的一部分。每一天都有一个独特的first_trade_AM,我发现它使用:
for date in z['Time'].dt.day.unique():
first_trade_AM = z.loc[(z['Time'].dt.day == date) & (z['EventType'] == 'trade'), 'Time'].head(1).item()
我的后续问题是,如何在特定日期的独特first_trade_AM之前删除所有独特日期的观察结果,而不影响其他日期的数据?
编辑:
更一般的数据:
index time EventType
0 2017-03-01 09:30:00.233 other
1 2017-03-01 09:30:00.243 trade
2 2017-03-01 09:30:00.319 trade
3 2017-03-01 09:30:00.981 other
4 2017-03-01 09:30:02.555 other
5 2017-03-02 09:30:02.959 other
6 2017-03-02 09:30:03.908 other
7 2017-03-02 09:30:12.659 trade
8 2017-03-02 09:30:19.006 trade
9 2017-03-02 09:30:22.990 trade
10 2017-03-02 09:30:23.166 other
11 2017-03-02 09:30:27.879 other
12 2017-03-02 09:30:28.370 other
答案 0 :(得分:0)
我认为最好是datetime
按位置选择,iat
:
first_obs_AM = z.loc[(z['Time'].dt.day == date) & (z['EventType'] == 'trade'), 'Time'].iat[0]
最后过滤掉的值少于first_trade_AM
,因此需要将<
更改为>=
:
z = z[z['Time'] >= first_trade_AM]
与~
的反转掩码相同:
z = z[~(z['Time'] < first_trade_AM)]
如果想比较日期:
z = z[z['Time'].dt.date >= first_trade_AM.date()]
编辑:
您可以对DataFrameGroupBy.cumsum
创建的0
进行比较,然后按boolean indexing
进行过滤:
m = (z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum() != 0
z = z[m]
print (z)
time EventType
index
1 2017-03-01 09:30:00.243 trade
2 2017-03-01 09:30:00.319 trade
3 2017-03-01 09:30:00.981 other
4 2017-03-01 09:30:02.555 other
7 2017-03-02 09:30:12.659 trade
8 2017-03-02 09:30:19.006 trade
9 2017-03-02 09:30:22.990 trade
10 2017-03-02 09:30:23.166 other
11 2017-03-02 09:30:27.879 other
12 2017-03-02 09:30:28.370 other
详情:
print ((z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum())
index
0 0.0
1 1.0
2 2.0
3 2.0
4 2.0
5 0.0
6 0.0
7 1.0
8 2.0
9 3.0
10 3.0
11 3.0
12 3.0
Name: EventType, dtype: float64
print ((z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum() != 0)
index
0 False
1 True
2 True
3 True
4 True
5 False
6 False
7 True
8 True
9 True
10 True
11 True
12 True
Name: EventType, dtype: bool