我希望能够测试列表中的某些单元格是否等于[0]
和 Var1==4
,并在以下情况下将新列设置为1
有时候是这样的。输入和预期输出如下。
我进行了几次尝试,但仅使用apply
和lambda
进行了管理,这对于50k +行无法很好地扩展。有没有更快捷的方法?
输入:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Id': [1,2,3,4],
'Var1': [[0,1],[0],[6,7],[0]],
})
Id Var1
1 [0, 1]
2 [0]
3 [6, 7]
4 [0]
我尝试过的事情:
df['ERR'] = 0
df.loc[(df['Id']==4) & (df['Var1']==[0]) , 'ERR'] = 1 # doesn't work
df.loc[(df['Id']==4) & (df['Var1'].isin([0])) , 'ERR'] = 1 # doesn't work
df['ERR'] = df.apply(lambda x: 1 if x['Id']==4 and x['Var1']==[0] else 0 , axis = 1)
预期输出:
Id Var1 ERR
1 [0, 1] 0
2 [0] 0
3 [6, 7] 0
4 [0] 1
答案 0 :(得分:2)
您可以按tuple
或set
进行比较:
df['ERR1'] = ((df['Id']==4) & (df['Var1'].apply(tuple)==(0, ))).astype(int)
df['ERR2'] = ((df['Id']==4) & ([tuple(x) ==(0, ) for x in df['Var1']])).astype(int)
df['ERR3'] = ((df['Id']==4) & (df['Var1'].apply(set)==set([0]))).astype(int)
df['ERR4'] = ((df['Id']==4) & ([set(x) == set([0]) for x in df['Var1']])).astype(int)
性能(取决于输入数据):
df = pd.DataFrame({'Id': [1,2,3,4],
'Var1': [[0,1],[0],[6,7],[0]],
})
df = pd.concat([df] * 10000, ignore_index=True)
In [188]: %timeit df['ERR1'] = ((df['Id']==4) & (df['Var1'].apply(tuple)==(0, ))).astype(int)
13.1 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [189]: %timeit df['ERR2'] = ((df['Id']==4) & ([tuple(x) ==(0, ) for x in df['Var1']])).astype(int)
8.98 ms ± 266 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [190]: %timeit df['ERR3'] = ((df['Id']==4) & (df['Var1'].apply(set)==set([0]))).astype(int)
17 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [191]: %timeit df['ERR4'] = ((df['Id']==4) & ([set(x) == set([0]) for x in df['Var1']])).astype(int)
19.4 ms ± 93.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)