Question

我正在尝试解析rpt文件并提取模式[SAMPLE]之后{}之间的所有内容，直到下一次出现此模式。因此应该是[SAMPLE] {这是我想要的数据} [SAMPLE]。文件中也可能只包含一个[SAMPLE]，因此可以有一个或多个[SAMPLE]节。

文件看起来像这样：

[SAMPLE]
{
[MS]
{
lots of text...
;Mass   % BPI
238.85  0.943
247.64  0.984
378.65  0.990
...
}
[CHROMATOGRAM]
{
lots of text...
}
lots of text...
[MS]
{
;Mass   % BPI
238.85  0.943
247.64  0.984
378.65  0.990
...
}
lots of text...
{
;Mass   % BPI
238.85  0.943
247.64  0.984
378.65  0.990
...
}
}
[SAMPLE]
{
[MS]
{
lots of text
;Mass   % BPI
238.85  0.943
247.64  0.984
378.65  0.990
...
}
[CHROMATOGRAM]
{
lots of text...
}
lots of text...
[MS]
{
;Mass   % BPI
238.85  0.943
247.64  0.984
378.65  0.990
...
}
lots of text...
{
;Mass   % BPI
238.85  0.943
247.64  0.984
378.65  0.990
...
}
}

我尝试使用此模式：

\[SAMPLE\]\s*{([^{}]+)}

但这仅给出了{}

之间的第一部分

在[SAMPLE]节之间有许多{}打开和关闭。有谁知道我可以用什么正则表达式来获取数据？

Answer 1

您可以使用

list_of_results = re.findall(r'\[SAMPLE][^[]*(?:\[(?!SAMPLE])[^[]*)*', text)

请参见regex demo和Python demo online。

regex基本上匹配从[SAMPLE]开始到最接近[SAMPLE]或字符串结尾的所有子字符串。

详细信息

\[SAMPLE]-一个[SAMPLE]子字符串
[^[]*-除[之外的0个或更多字符
(?:\[(?!SAMPLE])[^[]*)*-零个或多个序列
- \[(?!SAMPLE])-一个[字符，不能紧跟着SAMPLE]
- [^[]*-除[之外的0个或更多字符

正则表达式将{}中两个单词之间的所有内容

1 个答案: