我有2个文件。一个是我的"密钥文件"和其他是" lookupfile"。我正在尝试检查查找中是否存在密钥文件中的行。这是我的代码段
lookupfile = open("riskeng_recon_e_mso_transact_db_msoinputapplication_t2.txt","r")
with open("1.txt","r") as my_file:
for line in my_file:
print "-------------checking for "+line+"-----------"
for x in lookupfile:
#print x
if str(line) in str(x):
print "Line present"+line
我的2个文件有这种格式的记录。
Lookupfile:
1234asfd
32453sdfvs
sfgagss234
keyfile:
123
3245
124
我的问题是,在从密钥文件中获取第一条记录并将其与lookupfile进行比较后,它不会继续使用lookupfile中的下一条记录。
答案 0 :(得分:1)
现在这样做,你在第一个外循环迭代中耗尽了查找迭代器。嵌套循环的时间复杂度为O(M*N*L)
,其中L
是查找行的平均长度,对于两个长文件可能过多。您可以创建查找字符串的排序后缀数组,并对每个键使用二进制搜索:
from bisect import bisect_left
with open("1.txt") as myfile, open('...') as lookup:
# sorted lookup suffix array
l_u = sorted(l[i:] for l in lookup for i in range(1, len(l)))
for line in myfile:
if l_u[bisect_left(l_u, line)].startswith(line):
print('Line "{}" exists'.format(line))
时间复杂度现在为O(N*L*log(N*L) + M*log(N*L))
。对于行相对较短的大型文件(L*log(N*L)
和log(N*L)
远小于M,N
),这应该明显优于O(M*N)
。
答案 1 :(得分:0)
也可以在阅读时进行操作: 由于成对不是你想要的,我们需要创建一个列表,其中包含来自lookupfile的行以重用它。
with open("file1.txt", "w") as f:
f.write("""\
1234asfd
32453sdfvs
sfgagss234""")
with open("file2.txt", "w") as f:
f.write("""\
123
3245
124
324""")
with open("file1.txt") as f1, open("file2.txt") as f2:
# Store lookupfile in list
lookup = f1.read().split("\n")
# Loop lookupfile for every line in keyfile
for idx, line in enumerate(f2,1):
for idy, row in enumerate(lookup,1):
# Look for match
if line.strip() in row:
print("line {} present on line {}".format(idx,idy))
打印
line 1 present on line 1
line 2 present on line 2
line 4 present on line 2
答案 2 :(得分:-1)
通过阅读文件
创建两个列表listkf
和listlookup
listkf=[]
with open("keyfile", 'r') as kf:
for line in kf:
listkf.append(line.strip()) # adding key to list after stripping
listlookup=[]
with open("Lookupfile", 'r') as lf:
for line in lf:
listlookup.append(line)
假设您需要逐行匹配
for i in range(len(listlookup)):
if listkf[i] in listlookup[i]:
print("key exists")
else:
print("key does not exist")
如果要在整个key
中查找keyfile
中的Lookupfile
for x in listkf:
for y in listlookup:
if x in str(y):
print("key ", x, " exists")
break