在多个条件下过滤数据帧

时间:2016-07-28 13:54:50

标签: python pandas dataframe

data = {'year': ['11:23:19', '11:23:19', '11:24:19', '11:25:19', '11:25:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19'],
                'store_number': ['1944', '1945', '1946', '1948', '1948', '1949', '1947', '1948', '1949', '1947'],
                'retailer_name': ['Walmart', 'Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
                'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
                'id': [10, 10, 11, 11, 11, 10, 10, 11, 11, 10]}

        stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'amount', 'id'])
        stores.set_index(['retailer_name', 'store_number', 'year'], inplace=True)
        stores_grouped = stores.groupby(level=[0, 1, 2])

看起来像:

                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
CRV           1946         11:24:19       8  11
              1948         11:25:19       6  11
                           11:25:19       1  11
Walmart       1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1948         11:23:19       6  11
              1949         11:23:19      12  11
              1947         11:23:19      11  10

我设法过滤: stores_grouped.filter(lambda x: (len(x) == 1))

但是当我想过滤两个条件时:

我的组长度为1,id列等于10。 有什么想法吗?

4 个答案:

答案 0 :(得分:3)

实际上,filter需要标量bool,您只需在lambda中添加条件,就像普通if样式语句一样:

In [180]:
stores_grouped.filter(lambda x: (len(x) == 1 and x['id'] == 10))
​
Out[180]:
                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

答案 1 :(得分:3)

您可以使用:

print (stores_grouped.filter(lambda x: (len(x) == 1) & (x.id == 10).all()))
                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

答案 2 :(得分:1)

我会这样做:

In [348]: stores_grouped.filter(lambda x: (len(x) == 1)).query('id == 10')
Out[348]:
                                     amount  id
retailer_name store_number year
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

答案 3 :(得分:1)

在框外思考,将drop_duplicateskeep=False

一起使用
df.drop_duplicates(subset=['retailer_name', 'store_number', 'year'], keep=False) \
    .query('id == 10')

enter image description here

时序

enter image description here