Question

我有一个数据框z，其中一列是日期时间格式：

index       time
0      2017-03-01 09:30:00.233
1      2017-03-01 09:30:00.243
2      2017-03-01 09:30:00.319
3      2017-03-01 09:30:00.981
4      2017-03-01 09:30:02.555
5      2017-03-01 09:30:02.959
6      2017-03-01 09:30:03.908
7      2017-03-01 09:30:12.659
8      2017-03-01 09:30:19.006
9      2017-03-01 09:30:22.990
10     2017-03-01 09:30:23.166
11     2017-03-01 09:30:27.879
12     2017-03-01 09:30:28.370

基本上，我想删除数据中特定时间之前的行。例如，假设在这种情况下我想在09：30：22.990（第9行）之前删除所有行。我所拥有的是：

first_trade_AM = z['Time'][z['Time'].dt.day == date][z['EventType'] == 'trade'].head(1)  

for simplicity's sake, lets say this formula returns 
    9      2017-03-01 09:30:22.990

then I have 

z.drop(z['Time'][z['Time'].dt.day == date] < first_trade_AM)

但我收到错误消息：

ValueError: Can only compare identically-labeled Series objects

任何帮助将不胜感激。被困在这里一段时间了。谢谢！

编辑：我上面发布的数据集是整个月数据的一部分。每一天都有一个独特的first_trade_AM，我发现它使用：

for date in z['Time'].dt.day.unique():
    first_trade_AM = z.loc[(z['Time'].dt.day == date) & (z['EventType'] == 'trade'), 'Time'].head(1).item()

我的后续问题是，如何在特定日期的独特first_trade_AM之前删除所有独特日期的观察结果，而不影响其他日期的数据？

编辑：

更一般的数据：

index  time                      EventType
0      2017-03-01 09:30:00.233       other
1      2017-03-01 09:30:00.243       trade
2      2017-03-01 09:30:00.319       trade       
3      2017-03-01 09:30:00.981       other
4      2017-03-01 09:30:02.555       other
5      2017-03-02 09:30:02.959       other 
6      2017-03-02 09:30:03.908       other   
7      2017-03-02 09:30:12.659       trade 
8      2017-03-02 09:30:19.006       trade
9      2017-03-02 09:30:22.990       trade
10     2017-03-02 09:30:23.166       other
11     2017-03-02 09:30:27.879       other 
12     2017-03-02 09:30:28.370       other

Answer 1

我认为最好是datetime按位置选择，iat：

first_obs_AM = z.loc[(z['Time'].dt.day == date) & (z['EventType'] == 'trade'), 'Time'].iat[0]

最后过滤掉的值少于first_trade_AM，因此需要将<更改为>=：

z = z[z['Time'] >= first_trade_AM]

与~的反转掩码相同：

z = z[~(z['Time'] < first_trade_AM)]

如果想比较日期：

z = z[z['Time'].dt.date >= first_trade_AM.date()]

编辑：

您可以对DataFrameGroupBy.cumsum创建的0进行比较，然后按boolean indexing进行过滤：

m = (z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum() != 0

z = z[m]
print (z)
                         time EventType
index                                  
1     2017-03-01 09:30:00.243     trade
2     2017-03-01 09:30:00.319     trade
3     2017-03-01 09:30:00.981     other
4     2017-03-01 09:30:02.555     other
7     2017-03-02 09:30:12.659     trade
8     2017-03-02 09:30:19.006     trade
9     2017-03-02 09:30:22.990     trade
10    2017-03-02 09:30:23.166     other
11    2017-03-02 09:30:27.879     other
12    2017-03-02 09:30:28.370     other

详情：

print ((z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum())
index
0     0.0
1     1.0
2     2.0
3     2.0
4     2.0
5     0.0
6     0.0
7     1.0
8     2.0
9     3.0
10    3.0
11    3.0
12    3.0
Name: EventType, dtype: float64

print ((z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum() != 0)
index
0     False
1      True
2      True
3      True
4      True
5     False
6     False
7      True
8      True
9      True
10     True
11     True
12     True
Name: EventType, dtype: bool

'在尝试将数据帧中的列与单个数据点进行比较时，只能比较具有相同标记的系列对象的错误

1 个答案: