假设我有以下带有产品说明的Pandas DataFrame。
如何排除{4}或6以外id
的所有产品(status
)?
输入
id | description | status
-------------------------
1 | world1 | 1
1 | world2 | 4
1 | world3 | 1
1 | world4 | 4
1 | world5 | 4
1 | world6 | 4
1 | world7 | 1
1 | world8 | 4
1 | world9 | 4
1 | world10 | 4
1 | world11 | 4
1 | world12 | 4
1 | world13 | 4
1 | world14 | 4
1 | world15 | 1
2 | world1 | 4
2 | world2 | 4
2 | world3 | 5
2 | world15 | 6
2 | world8 | 6
2 | world4 | 5
2 | world7 | 5
输出:
id | description | status
-------------------------
2 | world1 | 4
2 | world2 | 4
2 | world3 | 5
2 | world15 | 6
2 | world8 | 6
2 | world4 | 5
2 | world7 | 5
答案 0 :(得分:1)
首先过滤包含id
中其他值的所有list
,然后过滤所有不包含id
值的a
:
L = [4,5,6]
a = df.loc[~df['status'].isin(L), 'id']
df = df[~df['id'].isin(a)]
print (df)
id description status
15 2 world1 4
16 2 world2 4
17 2 world3 5
18 2 world15 6
19 2 world8 6
20 2 world4 5
21 2 world7 5
详情:
print (a)
0 1
2 1
6 1
14 1
Name: id, dtype: int64
<强>计时强>:
np.random.seed(123)
N = 100000
L = np.random.randint(1000,size=N)
df = pd.DataFrame({'status': np.random.choice([4,5,6,7], p = (0.3,0.3,0.39,0.01), size=N),
'id':np.random.choice(L, N),
'description':np.random.choice(L, N)})
print (df)
L = [4,5,6]
In [461]: %%timeit
...: a = df.loc[~df['status'].isin(L), 'id']
...: df[~df['id'].isin(a)]
...:
...:
100 loops, best of 3: 1.91 ms per loop
#Wen's solution
In [462]: %%timeit
...: df['status']=df['status'].mask(~df['status'].isin([4,5,6]))
...: df.groupby('id').filter(lambda x : ~x.status.isnull().any() )
...:
10 loops, best of 3: 111 ms per loop
答案 1 :(得分:1)
两步
第一次使用mask
df['status']=df['status'].mask(~df['status'].isin([4,5,6]))
第二groupby
+ filter
df.groupby('id').filter(lambda x : ~x.status.isnull().any() )
Out[44]:
id description status
15 2 world1 4.0
16 2 world2 4.0
17 2 world3 5.0
18 2 world15 6.0
19 2 world8 6.0
20 2 world4 5.0
21 2 world7 5.0