如何修复multi_compare函数

时间:2017-01-24 05:00:47

标签: python function

import re
def multi_compare(pat_file : open, text_file1 : open, text_file2 : open) -> [(int,str,str)]:
   result = []
   m = []
   for p in pat_file:
       m = re.compile(p.rstrip())
       for num, line in enumerate(text_file1):
           for num2, line2 in enumerate(text_file2):
               if (m.match(line) != m.match(line2) and num == num2):
                   result.append((num,line,line2))
   return result

我正在编写一个名为multi_compare的函数,它将三个打开的​​文件作为参数:第一个文件包含一些正则表达式模式,第二个和第三个文件包含一些行文本。此函数返回3元组的列表。每个返回的3元组表示行号和每个文件中的一行(具有该行号的行),第一个文件中的行与第二个文件中的行匹配不同的模式。列表应按升序显示这些行号(可以不进行排序)。

例如,如果文件pats1.txt,texts1a.txt和texts1b.txt存储下面显示的信息,则调用

 multi_compare(open('pats1.txt'), open('texts1a.txt'), open('texts1b.txt')) 

返回以下列表:

[(2, '!aaab', '666b6'), (3, 'ambulance7', '7a')]

enter image description here

我的代码似乎不起作用。有人可以帮我修改我的代码吗?提前致谢。

2 个答案:

答案 0 :(得分:0)

好的,从我收集到的是,您要检查来自text_file1text_file2的匹配行号的每一行是否与任何模式匹配。但是,您想要确定一行是否与模式匹配而另一行不匹配。 True and FalseFalse and True。如果你得到一个匹配,那么忽略所有其他模式。

import re

def multi_compare(pat_file : open, text_file1 : open, text_file2 : open) -> [(int,str,str)]:
    result = []

    # create a list of each compiled pattern
    patterns = [ re.compile(p.rstrip()) for p in pat_file ]

    # iterate through each line number and lines
    for num, (line1, line2) in enumerate(zip(text_file1, text_file2), 1):
        line1, line2 = line1.strip(), line2.strip()
        for m in patterns:
            # compare matches - True, False or False, True will append
            if bool(m.match(line1)) != bool(m.match(line2)):
                result.append((num, line1, line2))
                # Found a match so exit inner for loop to stop matching again
                break

    return result

res = multi_compare(open('pats1.txt'), open('texts1a.txt'), open('texts1b.txt')) 
print(res)

# Output
[(2, '!aaab', '666b6'), (3, 'ambulance7', '7a')]

如果您更改为result.append((num, m.pattern, line1, line2))并删除break,则可以看到每行匹配的模式。

[(2, '.*\\d$', '!aaab', '666b6'), (3, '[a-z]', 'ambulance7', '7a'), (3, '.*b', 'ambulance7', '7a'), (3, '.*\\d$', 'ambulance7', '7a')]

答案 1 :(得分:0)

以下是我对此问题的解决方案。嵌套循环结构有点糟糕,所以你想要展开它,如下所示。正如R. Kumar所指出的那样,你所遇到的基本问题是你正在测试匹配的返回(这是一个对象或None)而不是它的布尔表示:

import re

def multi_compare(patterns_file, text_file1, text_file2):
    patterns = [ re.compile(x.strip()) for x in patterns_file ]
    matches1, matches2 = [], []

    for line in text_file1:
        # associate the line with its match array
        # where the match array is a list of booleans
        # that represent whether the line matched one of the specified patterns
        line = line.strip()
        matches = [ bool(x.match(line)) for x in patterns ]
        matches1.append((line, matches))

    for line in text_file2:
        line = line.strip()
        matches = [ bool(x.match(line)) for x in patterns ]
        matches2.append((line, matches))

    # compare the two text files by comparing the matches from each line
    for i, ((line1, match1), (line2, match2)) in enumerate(zip(matches1, matches2), 1):
        if match1 != match2:
            yield i, line1, line2

for x in multi_compare(open('pats1.txt', 'r'), open('text1a.txt', 'r'), open('text1b.txt', 'r')):
    print x