Question

我在查找比较两个文件以创建第三个文件的有效方法时遇到了一些麻烦。

我正在使用Python 3.6

第一个文件是我要删除的IP地址列表。第二个文件包含与要删除的IP地址关联的所有DNS记录。

如果我在第二个文件中找到DNS记录，我想将整行添加到第三个文件中。

这是文件1的示例：

IP
10.10.10.234
10.34.76.4

这是文件2的示例：

DNS Record Type,DNS Record,DNS Response,View
PTR,10.10.10.234,testing.example.com,internal
A,testing.example.com,10.10.10.234,internal
A,dns.google.com,8.8.8.8,external

这就是我想要做的。这是准确的，但它需要永远。文件2中有大约200万行，文件1中有150K行。

def create_final_stale_ip_file():
    PD = set()
    with open(stale_file) as f1:
        reader1 = csv.DictReader(f1)
        for row1 in reader1:
            with open(prod_dns) as f2:
                reader2 = csv.DictReader(f2)
                for row2 in reader2:
                    if row2['DNS Record Type'] == 'A':
                        if row1['IP'] == row2['DNS Response']:
                            PD.update([row2['View']+'del,'+row2['DNS Record Type']+','+row2['DNS Record']+','+row2['DNS Response']])
                    if row2['DNS Record Type'] == 'PTR':
                        if row1['IP'] == row2['DNS Record']:
                            PD.update([row2['View']+'del,'+row2['DNS Record Type']+','+row2['DNS Response']+','+row2['DNS Record']])


    o1 = open(delete_file,'a')
    for i in PD:
        o1.write(i+'\n')
    o1.close()

提前致谢！

Answer 1

首先应将整个IP文件读入set，然后检查第二个文件中的IP是否在该集合中找到，因为检查集合中是否存在元素是非常< / em> fast：

def create_final_stale_ip_file(): PD = set() # It's much prettier and easier to manage the strings in one place # and without using the + operator. Read about `str.format()` # to understand how these work. They will be used later in the code A_string = '{View}del,{DNS Record Type},{DNS Record},{DNS Response}' PTR_string = '{View}del,{DNS Record Type},{DNS Response},{DNS Record}' # We can open and create readers for both files at once with open(stale_file) as f1, open(prod_dns) as f2: reader1, reader2 = csv.DictReader(f1), csv.DictReader(f2) # Read all IPs into a python set, they're fast! ips = {row['IP'] for row in reader1} # Now go through every line and simply check if the IP # exists in the `ips` set we created above for row in reader2: if (row['DNS Record Type'] == 'A' and row['DNS Response'] in ips): PD.add(A_string.format(**row)) elif (row['DNS Record Type'] == 'PTR' and row2['DNS Record'] in ips): PD.add(PTR_string.format(**row)) # Finally, write all the lines to the file using `writelines()`. # Also, it's always better to use `with open()` with open(delete_file, 'a') as f: f.writelines(PD)

如你所见，我也改变了一些小事，比如：

使用writelines()
写入文件
使用with open()确保安全
打开最后一个文件
我们只在我们的集合中添加了一个元素，因此请使用PD.add()代替PD.update()

使用Python的真棒str.format()来创建更清晰的字符串格式

最后但并非最不重要的是，我实际上将它分成多个函数，一个用于读取文件，一个用于读取字典等，每个函数使用正确的参数而不是使用全局变量名称，如{{1和你似乎正在使用的stale_file一样。但这取决于你。

Answer 2

您可以非常轻松地使用grep执行此操作：

grep -xf file1 file2

这会为您提供一个文件，其中file2行匹配file1中的行。从那里开始，操作文本到你需要的最终形式应该会容易得多。

将包含IP的两个文件与Python进行比较

2 个答案: