使用嵌套for循环缓慢执行

时间:2017-05-19 13:59:35

标签: python python-3.x

我有一个相当大的文本文件(~16k行),我循环并为每一行,检查客户端IP:端口,服务器IP:端口和关键字是否存在于行中,使用两个for循环和嵌套if x in line语句,用于检查该行是否包含我要查找的信息。

在我确定了包含我正在寻找的值的行之后,我更新了一个sqlite数据库。最初,这需要相当长的时间来执行,因为我没有在手动事务中包装的SQL UPDATE语句。进行此更改后,执行时间显着改善,但是我仍然发现下面的代码需要几分钟才能完成,我觉得我的可怕循环结构是原因。

如果有人有任何性能提示来帮助加快下面的代码,我将非常感激:

c.execute("SELECT client_tuple, origin_tuple FROM connections")
# returns ~ 8k rows each with two items, clientIP:port and serverIP:port
tuples = c.fetchall()

with open('connection_details.txt', 'r') as f:
    c.execute('BEGIN TRANSACTION')
    # for each line in ~16k lines
    for line in f:
        # for each row returned from sql query
        for tuple in tuples:
            # if the client tuple (IP:Port) is in the line
            if tuple[0] in line:
                # if the origin tuple (IP:Port) is in the line
                if tuple[1] in line:
                    # if 'foo' is in the line
                    if 'foo' in line:
                        # lookup some value and update SQL with the value found
                        bar_value = re.findall(r'(?<=bar\s).+?(?=\,)', line)
                        c.execute("UPDATE connections "
                                    " SET bar = ? "
                                   "WHERE client_tuple = ? AND origin_tuple = ?",
                                    (bar_value[0], tuple[0], tuple[1]))

    conn.commit()

3 个答案:

答案 0 :(得分:7)

if 'foo' in line:检查应该在for tuple in tuples:迭代器之前,因此您将自动跳过不需要处理的行

循环之外的第二个小改进 - compile regexp并使用编译的匹配器。

答案 1 :(得分:5)

不幸的是,您无法收紧for循环,因为您需要遍历文件中每一行的所有元组。但是,您可以通过合并if语句来略微收紧代码。在迭代所有元组之前,您应该检查是否存在'foo'

with open('connection_details.txt', 'r') as f:
    c.execute('BEGIN TRANSACTION')
    # for each line in ~16k lines
    for line in f:
        # for each row returned from sql query
        if 'foo' in line:
            for tup in tuples:
                if tup[0] in line and tup[1] in line:

答案 2 :(得分:1)

对于for循环,您可以使用itertools,然后您可以将if语句转换为单个语句,如下所示:

import itertools

for line, tuple in itertools.product(f, tuples):
    if tuple[0] in line and tuple[1] in line and 'foo' in line: