我正在尝试从输入文件中查找格式。但是,如果我使用'r'并且有时会出现unicode错误,有时我会得不到匹配。
def extract_files(filename):
file = open(filename, 'r')
text = file.read()
files_match = re.findall('<Compile Include="src\asf\preprocessor\string.h">', text)
if not files_match:
sys.stderr.write('no match')
sys.exit()
for f in files_match:
print(f)
答案 0 :(得分:1)
您似乎试图在<Compile Include="
之后和">
之后提取所有字符串。我们可以这样做,但要注意这可能会破坏边缘情况!
import re
def extract_files(filename):
with open(filename,'r') as file:
text = file.read
matches = re.findall(r'(?<=<Compile Include=")[-.A-Za-z\\]+(?=")', text)
# finds all pathnames that contain ONLY lowercase or uppercase letters,
# a dash (-) or a dot (.), separated ONLY by a backslash (\)
# terminates as soon as it finds a double-quote ("), NOT WHEN IT FINDS A
# SINGLE QUOTE (')
if not matches:
sys.stderr.write("no match")
sys.exit()
for match in matches:
print(match)