比较两个列表,并添加一个与findKB不同的新列
df = pd.DataFrame({'A': [['10', '20', '30', '40'],['50', '60', '70', '80']],
'B': [['a', 'b'],['c','d']]})
findKBs = ['10','90']
A B
0 [10, 20, 30, 40] [a, b]
1 [50, 60, 70, 80] [c, d]
这将是您想要的行为
A B C
0 [10, 20, 30, 40] [a, b] [90]
1 [50, 60, 70, 80] [c, d] [10,90]
预先感谢
答案 0 :(得分:4)
我们可以使用np.isin
df['C'] = [find_kb[~np.isin(find_kb, a)]
for a, find_kb in zip(df['A'], np.array([findKBs] * len(df)))]
print(df)
A B C
0 [10, 20, 30, 40] [a, b] [90]
1 [50, 60, 70, 80] [c, d] [10, 90]
或者我们可以使用filter
df['C'] = [list(filter(lambda val: val not in a, find_kb))
for a, find_kb in zip(df['A'],[findKBs] * len(df))]
#df['C'] = df['A'].map(lambda list_a: list(filter(lambda val: val not in list_a,
# findKBs)
# )
# )
filter
较难阅读,但效率更高:
%%timeit
df['C'] = [list(filter(lambda val: val not in a, find_kb))
for a, find_kb in zip(df['A'],[findKBs] * len(df))]
194 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
df['C'] = [find[~np.isin(find, a)] for a, find in zip(df['A'], np.array([findKBs] * len(df)))]
334 µs ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
df['C'] = df['A'].map(lambda x: np.setdiff1d(findKBs,x))
534 µs ± 17.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
答案 1 :(得分:4)
您可以在此处使用np.setdiff1d
进行尝试。
df['C'] = df['A'].map(lambda x: np.setdiff1d(findKBs,x))
A B C
0 [10, 20, 30, 40] [a, b] [90]
1 [50, 60, 70, 80] [c, d] [10, 90]
要避免使用lambda,您可以在此处使用functools.partial
。
from functools import partial
diff = partial(np.setdiff1d, findKBs)
df['C'] = df['A'].map(diff)
答案 2 :(得分:2)
set
的sub
df['C']=(set(findKBs)-df.A.map(set)).map(list)
df
Out[253]:
A B C
0 [10, 20, 30, 40] [a, b] [90]
1 [50, 60, 70, 80] [c, d] [10, 90]