Question

我有一个.txt文件，其中包含一些文本（从edifact文件复制），我想匹配某些字段，我基本上只想要日期（匹配1，组0）

这是我拥有的正则表达式 https://regex101.com/r/oSVlS8/6

但是我不能在我的代码中实现它，我只想要匹配的组0。

这是我的代码：

My fixture

这就是我得到的结果：

http://www.example.com/

我实际上想要的是“ 20080702”

我尝试了类似regex = r"^((?:INV)\+(?:[^+\n]*\+){4})\d{8}" with open ("test edifakt 1 bk v1.txt", "r") as f: result = re.findall(regex,f.read(),re.MULTILINE) print(result)之类的操作，但这没有用。我知道了：
['INV+ED Format 1+Brustkrebs+19880117+E000000001+']

我也尝试将其作为类似print(result.group(0))的参数，但得到 AttributeError: 'list' object has no attribute 'group'

如果我使用result = re.findall(regex,f.read(),group(0),re.MULTILINE)及其字符串，我真的只能匹配某个组吗？

Answer 1

尝试此正则表达式

re.search(r'(?:INV)\+(?:[^+\n]*\+){4}(\d{8})', text).group(1)

返回

'20080702'

Answer 2

您可以更改捕获组以捕获数字。

请注意，您可以省略INV (?:INV)周围的非捕获组，并使用*作为[^+\n]*\+的量词也可以匹配4个连续的加号++++

^INV\+(?:[^+\n]*\+){4}(\d{8})

^字符串的开头
INV\+匹配INV +
(?:非捕获组
- [^+\n]*\+匹配0+次除+或换行符以外的任何字符
){4}关闭小组并重复4次
(\d{8})捕获第1组，匹配8位数字

Regex demo | Python demo

例如

regex = r"^INV\+(?:[^+\n]*\+){4}(\d{8})"
result = re.findall(regex, test_str, re.MULTILINE)
print(result)

输出

['20080702']

如果要使用分组方法，可以使用

matches = re.search(regex, test_str, re.MULTILINE) 
if matches:
    print(matches.group(1))

输出

20080702

Python demo

re.findall将返回捕获组的值。
re.search将返回具有组方法的match object。

从正则表达式匹配中获取特定组

2 个答案: