如何在单元格为列表的列中测试列表是否相等

时间:2018-07-18 08:27:33

标签: pandas

我希望能够测试列表中的某些单元格是否等于[0] Var1==4,并在以下情况下将新列设置为1有时候是这样的。输入和预期输出如下。
我进行了几次尝试,但仅使用applylambda进行了管理,这对于50k +行无法很好地扩展。有没有更快捷的方法?
输入:

import numpy as np
import pandas as pd


df = pd.DataFrame({'Id': [1,2,3,4],
                   'Var1': [[0,1],[0],[6,7],[0]],
                  })

Id    Var1
1  [0, 1]
2     [0]
3  [6, 7]
4     [0]

我尝试过的事情:

df['ERR'] = 0
df.loc[(df['Id']==4) & (df['Var1']==[0]) , 'ERR'] = 1     # doesn't work
df.loc[(df['Id']==4) & (df['Var1'].isin([0])) , 'ERR'] = 1 # doesn't work
df['ERR'] = df.apply(lambda x: 1 if x['Id']==4 and x['Var1']==[0]   else 0 , axis = 1)

预期输出:

Id    Var1  ERR
 1  [0, 1]    0
 2     [0]    0
 3  [6, 7]    0
 4     [0]    1

1 个答案:

答案 0 :(得分:2)

您可以按tupleset进行比较:

df['ERR1'] = ((df['Id']==4) & (df['Var1'].apply(tuple)==(0, ))).astype(int)
df['ERR2'] = ((df['Id']==4) & ([tuple(x) ==(0, )  for x in df['Var1']])).astype(int)

df['ERR3'] = ((df['Id']==4) & (df['Var1'].apply(set)==set([0]))).astype(int)
df['ERR4'] = ((df['Id']==4) & ([set(x) == set([0])  for x in df['Var1']])).astype(int)

性能(取决于输入数据):

df = pd.DataFrame({'Id': [1,2,3,4],
                   'Var1': [[0,1],[0],[6,7],[0]],
                  })
df = pd.concat([df] * 10000, ignore_index=True)


In [188]: %timeit df['ERR1'] = ((df['Id']==4) & (df['Var1'].apply(tuple)==(0, ))).astype(int)
13.1 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [189]: %timeit df['ERR2'] = ((df['Id']==4) & ([tuple(x) ==(0, )  for x in df['Var1']])).astype(int)
8.98 ms ± 266 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [190]: %timeit df['ERR3'] = ((df['Id']==4) & (df['Var1'].apply(set)==set([0]))).astype(int)
17 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [191]: %timeit df['ERR4'] = ((df['Id']==4) & ([set(x) == set([0])  for x in df['Var1']])).astype(int)
19.4 ms ± 93.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)