Question

我有一个相当大的文本文件（~16k行），我循环并为每一行，检查客户端IP：端口，服务器IP：端口和关键字是否存在于行中，使用两个for循环和嵌套if x in line语句，用于检查该行是否包含我要查找的信息。

在我确定了包含我正在寻找的值的行之后，我更新了一个sqlite数据库。最初，这需要相当长的时间来执行，因为我没有在手动事务中包装的SQL UPDATE语句。进行此更改后，执行时间显着改善，但是我仍然发现下面的代码需要几分钟才能完成，我觉得我的可怕循环结构是原因。

如果有人有任何性能提示来帮助加快下面的代码，我将非常感激：

c.execute("SELECT client_tuple, origin_tuple FROM connections")
# returns ~ 8k rows each with two items, clientIP:port and serverIP:port
tuples = c.fetchall()

with open('connection_details.txt', 'r') as f:
    c.execute('BEGIN TRANSACTION')
    # for each line in ~16k lines
    for line in f:
        # for each row returned from sql query
        for tuple in tuples:
            # if the client tuple (IP:Port) is in the line
            if tuple[0] in line:
                # if the origin tuple (IP:Port) is in the line
                if tuple[1] in line:
                    # if 'foo' is in the line
                    if 'foo' in line:
                        # lookup some value and update SQL with the value found
                        bar_value = re.findall(r'(?<=bar\s).+?(?=\,)', line)
                        c.execute("UPDATE connections "
                                    " SET bar = ? "
                                   "WHERE client_tuple = ? AND origin_tuple = ?",
                                    (bar_value[0], tuple[0], tuple[1]))

    conn.commit()

Answer 1

if 'foo' in line:检查应该在for tuple in tuples:迭代器之前，因此您将自动跳过不需要处理的行

循环之外的第二个小改进 - compile regexp并使用编译的匹配器。

Answer 2

不幸的是，您无法收紧for循环，因为您需要遍历文件中每一行的所有元组。但是，您可以通过合并if语句来略微收紧代码。在迭代所有元组之前，您应该检查是否存在'foo'。

with open('connection_details.txt', 'r') as f:
    c.execute('BEGIN TRANSACTION')
    # for each line in ~16k lines
    for line in f:
        # for each row returned from sql query
        if 'foo' in line:
            for tup in tuples:
                if tup[0] in line and tup[1] in line:

Answer 3

对于for循环，您可以使用itertools，然后您可以将if语句转换为单个语句，如下所示：

import itertools

for line, tuple in itertools.product(f, tuples):
    if tuple[0] in line and tuple[1] in line and 'foo' in line:

使用嵌套for循环缓慢执行

3 个答案: