在python 3中找不到输入文件中的子字符串

时间:2014-03-06 19:22:00

标签: python python-2.7 python-3.x python-unicode

我正在尝试从输入文件中查找格式。但是,如果我使用'r'并且有时会出现unicode错误,有时我会得不到匹配。

def extract_files(filename):
    file = open(filename, 'r')
    text = file.read()
    files_match = re.findall('<Compile Include="src\asf\preprocessor\string.h">', text)
    if not files_match:
        sys.stderr.write('no match')
        sys.exit()
    for f in files_match:
        print(f)

1 个答案:

答案 0 :(得分:1)

您似乎试图在<Compile Include="之后和">之后提取所有字符串。我们可以这样做,但要注意这可能会破坏边缘情况!

import re

def extract_files(filename):
    with open(filename,'r') as file:
        text = file.read
    matches = re.findall(r'(?<=<Compile Include=")[-.A-Za-z\\]+(?=")', text)
    # finds all pathnames that contain ONLY lowercase or uppercase letters,
    # a dash (-) or a dot (.), separated ONLY by a backslash (\)
    # terminates as soon as it finds a double-quote ("), NOT WHEN IT FINDS A
    # SINGLE QUOTE (')
    if not matches:
        sys.stderr.write("no match")
        sys.exit()
    for match in matches:
        print(match)