'在尝试将数据帧中的列与单个数据点进行比较时,只能比较具有相同标记的系列对象的错误

时间:2017-11-04 07:51:33

标签: python pandas

我有一个数据框z,其中一列是日期时间格式:

index       time
0      2017-03-01 09:30:00.233
1      2017-03-01 09:30:00.243
2      2017-03-01 09:30:00.319
3      2017-03-01 09:30:00.981
4      2017-03-01 09:30:02.555
5      2017-03-01 09:30:02.959
6      2017-03-01 09:30:03.908
7      2017-03-01 09:30:12.659
8      2017-03-01 09:30:19.006
9      2017-03-01 09:30:22.990
10     2017-03-01 09:30:23.166
11     2017-03-01 09:30:27.879
12     2017-03-01 09:30:28.370

基本上,我想删除数据中特定时间之前的行。例如,假设在这种情况下我想在09:30:22.990(第9行)之前删除所有行。我所拥有的是:

first_trade_AM = z['Time'][z['Time'].dt.day == date][z['EventType'] == 'trade'].head(1)  

for simplicity's sake, lets say this formula returns 
    9      2017-03-01 09:30:22.990

then I have 

z.drop(z['Time'][z['Time'].dt.day == date] < first_trade_AM)

但我收到错误消息:

ValueError: Can only compare identically-labeled Series objects

任何帮助将不胜感激。被困在这里一段时间了。谢谢!

编辑:我上面发布的数据集是整个月数据的一部分。每一天都有一个独特的first_trade_AM,我发现它使用:

for date in z['Time'].dt.day.unique():
    first_trade_AM = z.loc[(z['Time'].dt.day == date) & (z['EventType'] == 'trade'), 'Time'].head(1).item()

我的后续问题是,如何在特定日期的独特first_trade_AM之前删除所有独特日期的观察结果,而不影响其他日期的数据?

编辑:

更一般的数据:

index  time                      EventType
0      2017-03-01 09:30:00.233       other
1      2017-03-01 09:30:00.243       trade
2      2017-03-01 09:30:00.319       trade       
3      2017-03-01 09:30:00.981       other
4      2017-03-01 09:30:02.555       other
5      2017-03-02 09:30:02.959       other 
6      2017-03-02 09:30:03.908       other   
7      2017-03-02 09:30:12.659       trade 
8      2017-03-02 09:30:19.006       trade
9      2017-03-02 09:30:22.990       trade
10     2017-03-02 09:30:23.166       other
11     2017-03-02 09:30:27.879       other 
12     2017-03-02 09:30:28.370       other

1 个答案:

答案 0 :(得分:0)

我认为最好是datetime按位置选择,iat

first_obs_AM = z.loc[(z['Time'].dt.day == date) & (z['EventType'] == 'trade'), 'Time'].iat[0]

最后过滤掉的值少于first_trade_AM,因此需要将<更改为>=

z = z[z['Time'] >= first_trade_AM]

~的反转掩码相同:

z = z[~(z['Time'] < first_trade_AM)]

如果想比较日期:

z = z[z['Time'].dt.date >= first_trade_AM.date()]

编辑:

您可以对DataFrameGroupBy.cumsum创建的0进行比较,然后按boolean indexing进行过滤:

m = (z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum() != 0

z = z[m]
print (z)
                         time EventType
index                                  
1     2017-03-01 09:30:00.243     trade
2     2017-03-01 09:30:00.319     trade
3     2017-03-01 09:30:00.981     other
4     2017-03-01 09:30:02.555     other
7     2017-03-02 09:30:12.659     trade
8     2017-03-02 09:30:19.006     trade
9     2017-03-02 09:30:22.990     trade
10    2017-03-02 09:30:23.166     other
11    2017-03-02 09:30:27.879     other
12    2017-03-02 09:30:28.370     other

详情:

print ((z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum())
index
0     0.0
1     1.0
2     2.0
3     2.0
4     2.0
5     0.0
6     0.0
7     1.0
8     2.0
9     3.0
10    3.0
11    3.0
12    3.0
Name: EventType, dtype: float64

print ((z['EventType'] == 'trade').groupby(z['time'].dt.day).cumsum() != 0)
index
0     False
1      True
2      True
3      True
4      True
5     False
6     False
7      True
8      True
9      True
10     True
11     True
12     True
Name: EventType, dtype: bool