Question

我正在寻找一种更快地执行循环的方法。使用当前代码，计算将永远持续下去。所以我正在寻找一种方法来提高我的代码效率。

编辑：我不认为要么解释，我需要创建一个程序来完成所有可能的8位数组合，不要忘记包含大写，小写和数字..然后加密md5这些可能的组合并保存它们到一个文件。但是我有新的问题，这个过程需要63年才会权衡这个文件？，作为脚本的结尾？最新购买vps服务器用于此任务，但如果它需要63年更好甚至不尝试哈哈..

我是编码新手，感谢所有帮助

import hashlib
from random import choice

longitud = 8
valores = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def enc(string):
    m = hashlib.md5()
    m.update(string.encode('utf-8'))
    return m.hexdigest()

def code():
    p = ""
    p = p.join([choice(valores) for i in xrange(longitud)])
    text = p
    return text

i = 1
for i in xrange(2000000000000000000):
    cod = code()
    md = enc(cod)
    print cod
    print md
    i += 1
    print i
    f=open('datos.txt','a')
    f.write("%s " % cod)
    f.write("%s" % md)
    f.write('\n')
    f.close()

Answer 1

您没有充分利用具有多个中央处理单元的现代计算机的全部功能！这是迄今为止最好的优化，因为这是 CPU绑定。注意：对于I / O绑定操作multithreading（使用线程模块）是合适的。

让我们看看python如何使用multiprocessing module（阅读评论）轻松实现这一目标：

import hashlib
# you're sampling a string so you need sample, not 'choice'
from random import sample
import multiprocessing
# use a thread to synchronize writing to file
import threading

# open up to 4 processes per cpu
processes_per_cpu = 4
processes = processes_per_cpu * multiprocessing.cpu_count()
print "will use %d processes" % processes
longitud = 8
valores = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
# check on smaller ranges to compare before trying your range... :-)
RANGE = 200000
def enc(string):
    m = hashlib.md5()
    m.update(string.encode('utf-8'))
    return m.hexdigest()

# we synchronize the results to be written using a queue shared by processes
q = multiprocessing.Manager().Queue()

# this is the single point where results are written to the file
# the file is opened ONCE (you open it on every iteration, that's bad)
def write_results():
    with open('datos.txt', 'w') as f:
        while True:
            msg = q.get()
            if msg == 'close':
                break;
            else:
                f.write(msg)

# this is the function each process uses to calculate a single result
def calc_one(i):
    s = ''.join(sample(valores, longitud))
    md = enc(s)
    q.put("%s %s\n" % (s, md))

# we start a process pool of workers to spread work and not rely on
# a single cpu
pool = multiprocessing.Pool(processes=processes)

# this is the thread that will write the results coming from
# other processes using the queue, so it's execution target is write_results
t = threading.Thread(target=write_results)
t.start()
# we use 'map_async' to not block ourselves, this is redundant here,
# but it's best practice to use this when you don't HAVE to block ('pool.map')
pool.map_async(calc_one, xrange(RANGE))
# wait for completion
pool.close()
pool.join()
# tell result-writing thread to stop
q.put('close')
t.join()

在这段代码中可能还有更多的优化要做，但是对于你所提出的任何cpu绑定任务的主要优化是使用多处理。

注意：文件写入的一个简单优化是聚合队列中的一些结果并将它们一起写入（如果你有很多cpus超过单个写入线程的速度）< / p>

note 2 ：由于OP正在考虑重复组合/排列的东西，应该注意的是，有一个模块可以做到这一点，它被称为{{3 }}

Answer 2

虽然它有助于调试，但我发现打印使程序运行速度变慢，所以可能不会打印得那么多。我也可以从循环中移开＆＃34; f = open（＆＃39; datos.txt＆＃39;＆＃39; a＆＃39;），因为我可以想象打开同一个文件重复可能会导致一些时间问题，然后移动＆＃34; f.close（）＆＃34;离开循环也到程序结束。

CHANGED

Answer 3

请注意，您应该使用

for cod in itertools.product(valores, longitud):

而不是通过random.sample选择字符串，因为这只会访问一次给定的字符串。

另请注意，对于您的给定值，此循环具有218340105584896次迭代。输出文件将占用9170284434565632字节或8PB。

Answer 4

首先配置您的程序（使用cProfile模块：https://docs.python.org/2/library/profile.html和http://ymichael.com/2014/03/08/profiling-python-with-cprofile.html），但我愿意打赌您的程序是IO限制的（如果您的CPU使用率）在一个核心上永远不会达到100％，这意味着你的硬盘太慢，无法跟上程序其余部分的执行速度。）

考虑到这一点，首先要更改您的程序，以便：

它会打开和关闭循环之外的文件（打开和关闭文件的速度非常慢）。
它只在每次迭代中进行一次write调用（每次调用转换为系统调用，这些调用很昂贵），如下所示：f.write("%s %s\n" % (cod, md))

Python中的循环效率

4 个答案:

CHANGED