Python中字符串元素比较的最快方法

时间:2014-03-27 15:47:53

标签: python numpy

我正在寻找python中比较string元素的最快方法。

import os, glob, numpy as np

with open ('fname.txt','r') as fi:   ##This infile contains 9 thousands of string elements
    all_list = fi.read().splitlines()

existing_list = glob.glob('*jpg') ##This contains 5 thousands elements
existing_list = [os.path.basename(f) for f in existing_list]

remaining_list = [f for f in all_list if f not in existing_list]
for i in remaining list:
    print i

如何在Numpy中执行它?

all_list = np.array(all_list)
existing_list = np.array(existing_list)
remaining_list = ???

1 个答案:

答案 0 :(得分:1)

如果你使用一套,你可以优化这个而不是numpy

existing_set = {os.path.basename(f) for f in existing_list}  # set comprehension, python2.7+
# alternatively:  set(os.path.basename(f) for f in existing_list)

remaining_list = [f for f in all_list if f not in existing_set]

我怀疑你使用numpy会在这里获得很多表现,即使你想办法做到这一点......