Question

我创建了文件＆＃39; file1.txt＆＃39;有＆＃34;你好＆＃34;作为搜索模式。程序下面的程序完全计算了行数，虽然这里的问题是单词或模式计数＆＃34; hello＆＃34;。对于同一行，如果有3＆＃39; hello＆＃39;，它仍然只计算一个。在我的文件中总共有13＆＃39;你好＆＃39;图案。其中2行有3＆＃39;你好＆＃39;图案。所以，最终我得到的答案是9而不是13。因此，对于每一行，它只计为1.如何解决这个问题？

import re

def reg_exp():
    pattern = 'hello'
    infile = open('file1.txt', 'r')
    match_count = 0
    lines = 0

    for line in infile:
        match = re.search(pattern, line)
        if match:
            match_count += 1
            lines += 1
    return (lines, match_count)

if __name__ == "__main__":
    lines, match_count = reg_exp()
    print 'LINES::', lines
    print 'MATCHES::', match_count

Answer 1

正则表达式如何运作。 re.search()会在找到第一个匹配项后立即返回。您可以使用re.finditer()进行迭代，或使用re.findall()返回每行的所有匹配项。

for line in infile:
    match = re.findall(pattern, line)   
    if match:
        match_count += len(match)
        lines += 1

ideone Demo

re.search(pattern, string, flags=0)

扫描字符串寻找   正则表达式模式生成的第一个位置   匹配，并返回相应的MatchObject实例。如果，则返回无   字符串中没有位置与模式匹配;请注意这是   不同于在某个点找到零长度匹配   串。

re.findall(pattern, string, flags=0)

返回所有非重叠匹配   字符串中的模式，作为字符串列表。字符串是   从左到右扫描，并按找到的顺序返回匹配。如果   模式中存在一个或多个组，返回列表   组;如果模式有多个，这将是一个元组列表   组。结果中包含空匹配，除非他们触摸了   另一场比赛的开始。

Answer 2

def reg_exp():
    pattern = '(hello)'
    infile = open('file1.txt', 'r')
    match_count = 0
    lines = 0

    for line in infile:
        match = re.search(pattern, line)
        if match:
            match_count += len(match.groups())
            lines += 1
    return (lines, match_count)

使用文件

2 个答案: