使用python

时间:2018-12-05 20:48:43

标签: python string search

使用python,如何处理两个文本文件。 例如:a.txt有5个组,b.txt也有4个组。 b.txt将查找a.txt上可用的组。如果找到,则将其写入output.txt,如果找不到,则不要将其写入output.txt。 组中的数字应该匹配,但顺序并不重要。

a.txt

GROUP :[11111, 22222, 33333]
GROUP :[22222, 11111]
GROUP :[46098]
GROUP :[66666, 55555, 44444]
GROUP :[55555, 44444]

b.txt

GROUP :[11111, 33333]
GROUP :[46098]
GROUP :[22222, 11111]
GROUP :[44444, 55555, 66666]

output.txt

GROUP :[22222, 11111]
GROUP :[46098]
GROUP :[44444, 55555, 66666]

2 个答案:

答案 0 :(得分:1)

这不是世界上最漂亮的东西,但应该完成工作:

from collections import Counter

with open('a.txt', 'r') as a:
    a_list = []
    for line in a:
        groups = line.split(':')[1]
        groups = groups.split('[')[1].split(']')[0]
        groups = groups.split(', ')
        a_list.append(groups)

with open('b.txt', 'r') as b:
    b_list = []
    for line in b:
        groups = line.split(':')[1]
        groups = groups.split('[')[1].split(']')[0]
        groups = groups.split(', ')
        b_list.append(groups)

with open('output.txt', 'w') as output:
    a_counter = [Counter(i) for i in a_list]
    for group in b_list:
        if Counter(group) in a_counter:
            output.write(f"GROUP :{group}\n")

答案 1 :(得分:1)

使用正则表达式和re模块:

import re

grp_tmpl = list()

# Register all groups
f = open('b.txt', 'r')
for line in f.readlines():
    grp_tmpl.append(sorted(re.findall('\d+', line)))

# Find groups
out = open('output.txt', 'w')
f = open('a.txt', 'r')
for line in f.readlines():
    for t in grp_tmpl:
        if t == sorted(re.findall('\d+', line)):
            out.write(line)