Question

示例字符串1：

7.2.P.8.1 

Summary and Conclusion  


A stability study with two batches was carried out.

示例字符串2：

7.2.S.1.2  

Structure 

Not applicable as the substance is not present.

我想编写一个正则表达式来获取此格式（7.2.P.8.1）或（7.2.S.1.2）或（8-3-1-P-2）或任何其他格式之后的第一行一切都将以。或-）分隔并检索。所以从一开始我需要作为输出（摘要和结论）和第二个实例（结构）。 “示例字符串”一词将不会成为文件内容的一部分，仅用于显示示例。

也许偶尔的格式如下：

9.2.P.8.1 Summary and Conclusion  

A stability study with two batches was carried out.

在这种情况下，我也想作为输出检索：摘要和结论

注意：我只想从文件中检索第一个匹配模式，而不是所有匹配，因此我的代码在找到第一个匹配模式后应该会中断。我该如何有效地做到这一点。

到目前为止的代码：

import re
def func():
    with open('/path/to/file.txt') as f: # Open the file (auto-close it too)
        for line in f: # Go through the lines one at a time
            m = re.match('\d+(?:[.-]\w+)*\s*', line) # Check each line
            if m: # If we have a match...
                return m.group(1) # ...return the value

Answer 1

您可以使用

import re

rx = re.compile(r'\d+(?:[.-]\w+)*\s*(\S.*)?$')
found = False
with open('/path/to/file.txt', 'r') as f:
    for line in f:
        if not found:                         # If the required line is not found yet
            m = rx.match(line.strip())        # Check if matching line found
            if m:                               
                if m.group(1):                # If Group 1 is not empty 
                    print(m.group(1))         # Print it
                    break                     # Stop processing
                else:                         # Else, the next blank line is necessary
                    found=True                # Set found flag to True
        else:
            if not line.strip():              # Skip blank line
                pass
            else:
                print(line.strip())           # Else, print the match
                break                         # Stop processing

请参见Python demo和regex demo。

注释

\d+(?:[.-]\w+)*\s*(\S.*)?$正则表达式先搜索1个以上的数字，然后搜索0个或多个.或-的重复项，后跟1个以上的字符字符，然后尝试匹配0+个空格，然后匹配捕获到组1中的所有非空白字符，然后是行末的0+字符。如果组1不为空，则找到匹配项，break停止处理。

否则，found布尔标志设置为True，并返回下一个非空行。

正则表达式读取文件并在Python中从文件内部返回匹配模式后的第一行

1 个答案: