Question

使用python，如何处理两个文本文件。例如：a.txt有5个组，b.txt也有4个组。 b.txt将查找a.txt上可用的组。如果找到，则将其写入output.txt，如果找不到，则不要将其写入output.txt。组中的数字应该匹配，但顺序并不重要。

a.txt

GROUP :[11111, 22222, 33333]
GROUP :[22222, 11111]
GROUP :[46098]
GROUP :[66666, 55555, 44444]
GROUP :[55555, 44444]

b.txt

GROUP :[11111, 33333]
GROUP :[46098]
GROUP :[22222, 11111]
GROUP :[44444, 55555, 66666]

output.txt

GROUP :[22222, 11111]
GROUP :[46098]
GROUP :[44444, 55555, 66666]

Answer 1

这不是世界上最漂亮的东西，但应该完成工作：

from collections import Counter

with open('a.txt', 'r') as a:
    a_list = []
    for line in a:
        groups = line.split(':')[1]
        groups = groups.split('[')[1].split(']')[0]
        groups = groups.split(', ')
        a_list.append(groups)

with open('b.txt', 'r') as b:
    b_list = []
    for line in b:
        groups = line.split(':')[1]
        groups = groups.split('[')[1].split(']')[0]
        groups = groups.split(', ')
        b_list.append(groups)

with open('output.txt', 'w') as output:
    a_counter = [Counter(i) for i in a_list]
    for group in b_list:
        if Counter(group) in a_counter:
            output.write(f"GROUP :{group}\n")

Answer 2

使用正则表达式和re模块：

import re

grp_tmpl = list()

# Register all groups
f = open('b.txt', 'r')
for line in f.readlines():
    grp_tmpl.append(sorted(re.findall('\d+', line)))

# Find groups
out = open('output.txt', 'w')
f = open('a.txt', 'r')
for line in f.readlines():
    for t in grp_tmpl:
        if t == sorted(re.findall('\d+', line)):
            out.write(line)

使用python

2 个答案: