Question

我是regex的新手。这是我的数据。

<p>[tag]y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38[/tag]</p>

我想得到这个。

y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38

这是我的正则表达式。

(<p>\[tag(.*)\])(.+)(\[\/tag\]<\/p>)

但由于新行（\ n）它不起作用。如果我使用re.DOTALL，它可以工作，但如果我的数据有多个记录，如

<p>[tag]y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38[/tag]</p>

<p>[tag]y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38[/tag]</p>

re.findall（）只返回一个匹配。我简单地想要这个。 [data1，data2，data3 ...]。我该怎么办？

Answer 1

您可以使用此正则表达式：

\[tag\]([\s\S]*?)\[\/tag\]

<强> Working demo

匹配信息：

MATCH 1
1.  [8-44]  `y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38`

更新：什么

\[tag\]
([\s\S]*?) --> the [\s\S]*? is used to match everything, since \S will capture
               all non blanks and \s will capture blanks. This is just a trick, you can
               also use [\D\d] or [\W\w]. Btw, the *? is just a ungreedy quantifier
\[\/tag\]

另一方面，如果您想允许标签中的属性，您可以使用：

\[tag.*?\]([\s\S]*?)\[\/tag\]

Answer 2

这很简单：

\](.*?)\[

reobj = re.compile(r"\](.*?)\[", re.IGNORECASE | re.DOTALL | re.MULTILINE)
result = reobj.findall(YOURSTRING)

<强>输出：

y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38

DEMO

正则表达式解释：

\] matches the character ] literally
1st Capturing group (.*?)
    .*? matches any character
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\[ matches the character [ literally
s modifier: single line. Dot matches newline characters

python正则表达式使用新行在两个标记之间获取文本

2 个答案: