如何删除条件列值上的全天行..熊猫

时间:2018-02-01 06:20:32

标签: python pandas time-series row

我有以下系列数据帧

我想删除condtion上的行(每天检查):检查aaa>然后删除所有行(在下面,删除所有2015-12-01行,因为aaa列最后3行有1000个值)

     ....
   date       time    aaa
2015-12-01,00:00:00,0
2015-12-01,00:15:00,0
2015-12-01,00:30:00,0
2015-12-01,00:45:00,0
2015-12-01,01:00:00,0
2015-12-01,01:15:00,0
2015-12-01,01:30:00,0
2015-12-01,01:45:00,0
2015-12-01,02:00:00,0
2015-12-01,02:15:00,0
2015-12-01,02:30:00,0
2015-12-01,02:45:00,0
2015-12-01,03:00:00,0
2015-12-01,03:15:00,0
2015-12-01,03:30:00,0
2015-12-01,03:45:00,0
2015-12-01,04:00:00,0
2015-12-01,04:15:00,0
2015-12-01,04:30:00,0
2015-12-01,04:45:00,0
2015-12-01,05:00:00,0
2015-12-01,05:15:00,0
2015-12-01,05:30:00,0
2015-12-01,05:45:00,0
2015-12-01,06:00:00,0
2015-12-01,06:15:00,0
2015-12-01,06:30:00,1000
2015-12-01,06:45:00,1000
2015-12-01,07:00:00,1000
         ....

我该怎么做?

1 个答案:

答案 0 :(得分:1)

我认为您需要MultiIndex首先按条件比较aaa的值,然后按boolean indexing过滤第一级的所有值,再次按isin过滤条件~

print (df)
                      aaa
date       time          
2015-12-01 00:00:00     0
           00:15:00     0
           00:30:00     0
           00:45:00     0
2015-12-02 05:00:00     0
           05:15:00   200
           05:30:00     0
           05:45:00     0
2015-12-03 06:00:00     0
           06:15:00     0
           06:30:00  1000
           06:45:00  1000
           07:00:00  1000

lvl0 = df.index.get_level_values(0)
idx = lvl0[df['aaa'].gt(100)].unique()
print (idx)
Index(['2015-12-02', '2015-12-03'], dtype='object', name='date')

df = df[~lvl0.isin(idx)]
print (df)
                     aaa
date       time         
2015-12-01 00:00:00    0
           00:15:00    0
           00:30:00    0
           00:45:00    0

如果第一列不是仅索引比较列date

print (df)
          date      time   aaa
0   2015-12-01  00:00:00     0
1   2015-12-01  00:15:00     0
2   2015-12-01  00:30:00     0
3   2015-12-01  00:45:00     0
4   2015-12-02  05:00:00     0
5   2015-12-02  05:15:00   200
6   2015-12-02  05:30:00     0
7   2015-12-02  05:45:00     0
8   2015-12-03  06:00:00     0
9   2015-12-03  06:15:00     0
10  2015-12-03  06:30:00  1000
11  2015-12-03  06:45:00  1000
12  2015-12-03  07:00:00  1000

idx = df.loc[df['aaa'].gt(100), 'date'].unique()
print (idx)
['2015-12-02' '2015-12-03']

df = df[~df['date'].isin(idx)]
print (df)
         date      time  aaa
0  2015-12-01  00:00:00    0
1  2015-12-01  00:15:00    0
2  2015-12-01  00:30:00    0
3  2015-12-01  00:45:00    0