我有一长串生成的名称,还有一个包含可接受名称的5000字文件。我想在列表中找到也出现在文件中的名称。我该怎么做?
我尝试使用循环,但这花了我所需的时间,因为我的名称文件太长,无法在整个文件中搜索每个生成的名称。当n为12个数字长时,我生成的列表中有531441个名称。
以下是一些代码:
from time import process_time
from itertools import product
start = process_time()
n = "5747867437"
phone = {2: ["A", "B", "C"], 3: ["D", "E", "F"], 4: {"G", "H", "I"}, 5: ["J", "K", "L"], 6: ["M", "N", "O"], 7: ["P", "R", "S"], 8: ["T", "U", "V"], 9: ["W", "X", "Y"]}
li = set(open("dict.txt", "r").read().strip().split("\n"))
num = []
names = []
for x in n:
num.append(phone[int(x)])
for y in product(*num):
names.append(''.join(y))
available = []
ad = False
for z in names:
if z in li:
available.append(z)
acceptable.sort()
print(acceptable)
if acceptable:
for a in acceptable:
print(a + "\n")
else:
print("NONE\n")
print(process_time() - start)
文件“ acceptable_names.txt”是其中包含可接受名称的文件。 现在需要3秒钟。有没有办法使它更快?
谢谢!
答案 0 :(得分:1)
使用集https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
在列表O(n)中查找 在集合O(1)中找到
# converting list to set
names = set(names)
for z in li:
if z in names:
acceptable.append(z)
acceptable.sort()
print(acceptable)
答案 1 :(得分:1)
如上所述,请使用集合之间的交集。像这样:
set_names = set(names)
set_li = set(li)
acceptable = set_names.intersection(set_li)
# if you want to sort it, convert it into a list first
print(list(acceptable).sort()
答案 2 :(得分:1)
根据建议-使用集。修剪代码中不需要的内容。您的代码看起来像MRE:
from itertools import product
def writeAcceptFile(filename):
with open(filename,"w") as f:
f.write("JIM\nJON\nTIM\nIKE")
def getNamesFromFile(filename):
with open(filename) as f:
return set(name.strip() for name in f.readlines())
fn = "acceptable_names.txt"
writeAcceptFile(fn)
accept = getNamesFromFile(fn)
phone = {2: ["A", "B", "C"], 3: ["D", "E", "F"], 4: {"G", "H", "I"},
5: ["J", "K", "L"], 6: ["M", "N", "O"], 7: ["P", "R", "S"],
8: ["T", "U", "V"], 9: ["W", "X", "Y"]}
n = 566
ok = [k for k in ( ''.join(l)
for l in product(*(phone[int(x)]
for x in str(n))))
if k in accept]
print(ok) # ['JON']
您可以使用列表和循环来代替“愚蠢的” oneliner:
# or by foot:
names = []
num = []
for x in str(n):
num.append(phone[int(x)])
for y in product(*num):
n = ''.join(y)
# only add name if in accepted list
if n in accept:
names.append(''.join(y))
print(names) # ['JON']
使用集合的原因是,对于包含检查,它们的速度极快(即恒定的时间,其中不存在多少东西)。
您的代码在整个允许的单词列表(5k)中为您生成的每个单词(531441)循环-使其变慢。