我想在.txt文件的每一行中找到与我的模式匹配的数字。 文字片段
sometext - 0.007442749125388171
sometext - 0.004296183916209439
sometext - 0.0037923667088698393
sometext - 0.003137404884873018
码
file = codecs.open(FILEPATH, encoding='utf-8')
for cnt, line in enumerate(file):
result_text = re.match(r'[a-zżźćńółęąś]*', line).group()
result_value = re.search(r'[0-9].[0-9]*', line).group()
print("Line {}: {}".format(cnt, line))
这很奇怪,因为re.search找到了结果:
<_sre.SRE_Match object; span=(8, 28), match='0.001879612135574806'>
但是如果我想将结果赋给变量我得到这个: 的错误
File "read.py", line 18, in <module>
result_value = re.search(r'[0-9].[0-9]*', line).group()
AttributeError: 'NoneType' object has no attribute 'group'
答案 0 :(得分:1)
在正则表达式中捕获组时,需要在要捕获的组周围放置括号。此外,您需要将要捕获的组的索引传递给group()
方法。
例如,对于第二场比赛,代码应修改如下:
# There is only 1 group here, so we pass index 1
result_value = re.search(r'([0-9].[0-9]*)', line).group(1)
正如您问题中的其他评论所提议的那样,您可能还想在尝试提取捕获的组之前检查是否找到了匹配项:
import re
with open("file.txt") as text_file:
for i, line in enumerate(text_file):
text_matches = re.match(r'([a-zżźćńółęąś]*)', line)
if text_matches is None:
continue
text_result = text_matches.group(1)
value_matches = re.search(r'([0-9].[0-9]*)', line)
if value_matches is None:
continue
value_result = value_matches.group(1)
print("Line {}: {}".format(text_result, value_result))
答案 1 :(得分:1)
我想建议更严格的正则表达式定义:
^
<强>解释强>
\s+-\s+
断言行的开头(\d+\.\d+)
中间的分隔符,其中包含可变数量的空格$
匹配十进制数import re
regex = r"^([a-zżźćńółęąś]+)\s+-\s+(\d+\.\d+)$"
test_str = ("sometext - 0.007442749125388171\n"
"sometext - 0.004296183916209439\n"
"sometext - 0.0037923667088698393\n"
"sometext - 0.003137404884873018")
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum}: {group}".format(groupNum = groupNum, group = match.group(groupNum)))
断言行尾const