好的,名字很混乱,但任务很简单。我有一个像下面这样的pandas数据框,我想在列B中包含的列表中执行单独的搜索或项目(在C中还有一个表示为字符串,以防万一有用)
>>> import pandas as pd
>>> df = pd.DataFrame({'A': range(10), 'B': [[x,x+1] for x in range(10)]})
>>> df['C'] = df['B'].astype(str)
>>> print df
A B C
0 0 [0, 1] [0, 1]
1 1 [1, 2] [1, 2]
2 2 [2, 3] [2, 3]
3 3 [3, 4] [3, 4]
4 4 [4, 5] [4, 5]
5 5 [5, 6] [5, 6]
6 6 [6, 7] [6, 7]
7 7 [7, 8] [7, 8]
8 8 [8, 9] [8, 9]
9 9 [9, 10] [9, 10]
例如,我想为值1,7选择行并返回以下行:
print magic_search([1,7], 'B', df)
A B C
0 0 [0, 1] [0, 1]
1 1 [1, 2] [1, 2]
6 6 [6, 7] [6, 7]
7 7 [7, 8] [7, 8]
如果我只查找零值,请执行以下操作:
print magic_search([0], 'B', df)
A B C
0 0 [0, 1] [0, 1]
效率不高的解决方案可能如下:
def not_so_magic_search(srchs, col, df):
bools = pd.concat([df[col].apply(lambda x: srch in x) for srch in srchs],axis=1)
return df[bools.sum(axis=1) > 0]
>>> not_so_magic_search([1,7], 'B', df)
A B C
0 0 [0, 1] [0, 1]
1 1 [1, 2] [1, 2]
6 6 [6, 7] [6, 7]
7 7 [7, 8] [7, 8]
问题是这很糟糕,因为你需要将lambda应用到每一行并且它只是不会缩放。我需要高效,因为原始DataFrame有500万行,我需要多次执行搜索。有关如何执行此操作的任何建议或想法?