Question

我创建了2个列表，其中包含来自不同计算机的日志文件。

因此我有machine1文件夹，其中包含perf_log.txt，stress_log.txt等。然后，我有machine2文件夹，其中包含与上述相同的文件名。在某些情况下，我可能有一台计算机上有日志，而另一台计算机上却没有。

到目前为止，我比较它们的内容的方法是解析一个文件夹中的所有文件，并将完整路径添加到列表中，然后对第二个文件夹执行相同的操作。然后，我想比较两台机器之间的对应日志（例如perf_log.txt）。

但是我最终只解析了第一个列表，但是我每次都要检查第二个列表是否包含条目，如果包含，则必须去检索索引，然后才能比较文件。如果一个文件夹中有许多文件，这似乎很昂贵

list1 = []
list2 = []

path1 = "~/Desktop/machine1/"
path2 = "~/Desktop/machine2/"

os.chdir(path1)
for entry in glob.glob("*.txt"):
    list1.append(entry)

os.chdir(path2)
for entry in glob.glob("*.txt"):
    list2.append(entry)

for logfile in list1:
    if logfile in list2:
        # Retrieve the index of the common file
        item_index = list2.index(logfile)
        # parse files and compare them
        comparefiles(path1 + logfile, path2 + list2[index])

如何简化这一过程，并尝试达到O（n）复杂性？

Answer 1

使用字典而不是列表。因此，您可以list1.append(entry)来代替dict[entry] = entry 这将在后面的代码if logfile in list2:中帮助您完成代码if dict.get(logfile,-1) != -1的遍历整个列表以找到它。相反，您可以执行comparefiles()来检查文件是否存在于O（1）的第二个路径中。然后，您只需将路径传递到您的dict1 = {} dict2 = {} path1 = "~/Desktop/machine1/" path2 = "~/Desktop/machine2/" os.chdir(path1) for entry in glob.glob("*.txt"): dict1[entry] = entry os.chdir(path2) for entry in glob.glob("*.txt"): dict2[entry] = entry for key in dict1: if dict2.get(key,-1) != -1: comparefiles(path1+key,path2+key)方法即可。

我希望这是有道理的。

这是应该起作用的代码。（我还没有测试过）

import app from './app';

Answer 2

因为顺序无关紧要。

list(set(list1) & set(list2))

您正在执行两个列表之间的相交运算，这两个列表首先是对集合进行的。现在，您有了list1中list2 AND 中的条目列表。在具有通用条目列表之后，您可以在这两个文件之间进行比较。

这不是O（n）解决方案，可能不是O（nlogn）解决方案。比问题中给出的代码还要好。

Answer 3

我不确定sets是否提供O（n）复杂性，但是它们肯定会使查找两个列表之间的差异变得更加容易，因为您可以执行减法操作：https://docs.python.org/3/tutorial/datastructures.html#sets

logfiles1 = {}
logfiles2 = {}
in_1_but_not_in_2 = logfiles1 - logfiles2

我假设文件名/路径是唯一的。

Answer 4

如果使用numpy和pandas系列，您可能会得到一些不错的改进。

开始之前，请先导入numpy和pandas。

import numpy as np
import pandas as pd

现在将您的列表转换为numpy数组。

list1 = np.array(list1)
list2 = np.array(list2)

现在，您可以利用索引来查找一组中的哪些文件也位于另一组中。

items_in_both = list1[pd.Series(list1).isin(list2)]

现在，items_in_both包含出现在两个列表中的所有项目。这样，您可以为comparefiles中的每个元素调用一次items_in_both函数。

Python，检查列表中是否包含其他列表中的元素，并进行有效比较

4 个答案: