我有以下系列数据帧
我想删除condtion上的行(每天检查):检查aaa>然后删除所有行(在下面,删除所有2015-12-01行,因为aaa列最后3行有1000个值)
....
date time aaa
2015-12-01,00:00:00,0
2015-12-01,00:15:00,0
2015-12-01,00:30:00,0
2015-12-01,00:45:00,0
2015-12-01,01:00:00,0
2015-12-01,01:15:00,0
2015-12-01,01:30:00,0
2015-12-01,01:45:00,0
2015-12-01,02:00:00,0
2015-12-01,02:15:00,0
2015-12-01,02:30:00,0
2015-12-01,02:45:00,0
2015-12-01,03:00:00,0
2015-12-01,03:15:00,0
2015-12-01,03:30:00,0
2015-12-01,03:45:00,0
2015-12-01,04:00:00,0
2015-12-01,04:15:00,0
2015-12-01,04:30:00,0
2015-12-01,04:45:00,0
2015-12-01,05:00:00,0
2015-12-01,05:15:00,0
2015-12-01,05:30:00,0
2015-12-01,05:45:00,0
2015-12-01,06:00:00,0
2015-12-01,06:15:00,0
2015-12-01,06:30:00,1000
2015-12-01,06:45:00,1000
2015-12-01,07:00:00,1000
....
我该怎么做?
答案 0 :(得分:1)
我认为您需要MultiIndex
首先按条件比较aaa
的值,然后按boolean indexing
过滤第一级的所有值,再次按isin
过滤条件~
:
print (df)
aaa
date time
2015-12-01 00:00:00 0
00:15:00 0
00:30:00 0
00:45:00 0
2015-12-02 05:00:00 0
05:15:00 200
05:30:00 0
05:45:00 0
2015-12-03 06:00:00 0
06:15:00 0
06:30:00 1000
06:45:00 1000
07:00:00 1000
lvl0 = df.index.get_level_values(0)
idx = lvl0[df['aaa'].gt(100)].unique()
print (idx)
Index(['2015-12-02', '2015-12-03'], dtype='object', name='date')
df = df[~lvl0.isin(idx)]
print (df)
aaa
date time
2015-12-01 00:00:00 0
00:15:00 0
00:30:00 0
00:45:00 0
如果第一列不是仅索引比较列date
:
print (df)
date time aaa
0 2015-12-01 00:00:00 0
1 2015-12-01 00:15:00 0
2 2015-12-01 00:30:00 0
3 2015-12-01 00:45:00 0
4 2015-12-02 05:00:00 0
5 2015-12-02 05:15:00 200
6 2015-12-02 05:30:00 0
7 2015-12-02 05:45:00 0
8 2015-12-03 06:00:00 0
9 2015-12-03 06:15:00 0
10 2015-12-03 06:30:00 1000
11 2015-12-03 06:45:00 1000
12 2015-12-03 07:00:00 1000
idx = df.loc[df['aaa'].gt(100), 'date'].unique()
print (idx)
['2015-12-02' '2015-12-03']
df = df[~df['date'].isin(idx)]
print (df)
date time aaa
0 2015-12-01 00:00:00 0
1 2015-12-01 00:15:00 0
2 2015-12-01 00:30:00 0
3 2015-12-01 00:45:00 0