Question

我正在尝试编写一个正则表达式，它将匹配由任意字符串分隔的相同主题标签。所以

Lorem Ipsum #molecule 只是打印和排版的虚拟文字行业。 Lorem Ipsum已经 #Molecule 成为业界标准的虚拟文本自16世纪以来，当一个未知的打印机采用了类型的厨房争先恐后地制作了一本样本书 ## Molecule 。它不仅幸存下来五个世纪，也是电子排版的飞跃，基本保持不变。它在20世纪60年代随着推广而普及发布包含Lorem Ipsum段落的 @Molecule Letraset表格，以及最近使用像Aldus PageMaker这样的桌面出版软件包括Lorem Ipsum的版本。

我该怎么做？这个正则表达式(\#[Mm]olecule)显然不起作用。

Answer 1

您可以尝试([#@]+[Mm]olecule)正则表达式，无需转义#并在索引1处获取匹配的组。

以下是regex101

上的演示

输出：

MATCH 1
1.  [12-21]     `#molecule`
MATCH 2
1.  [101-110]   `#Molecule`
MATCH 3
1.  [265-275]   `##Molecule`
MATCH 4
1.  [450-459]   `@Molecule`

以下是直接来自regex101网站的带有忽略大小写的示例代码。

import re
p = re.compile(ur'([#@]+molecule)', re.IGNORECASE)
test_str = ...

re.findall(p, test_str)

Answer 2

使用Character Class匹配其中一个字符。

>>> re.findall(r'[#@]+(?i)molecule', data)
['#molecule', '#Molecule', '##Molecule', '@Molecule']

注意：使用inline (?i) modifier启用不区分大小写的匹配。

Answer 3

s="""
Lorem Ipsum #molecule is simply dummy text of the printing and typesetting industry.    Lorem Ipsum has #Molecule been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book ##Molecule. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of @Molecule Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
"""


print re.findall("@+molecule|#+molecule",s,re.IGNORECASE)
['#molecule', '#Molecule', '##Molecule', '@Molecule']

由任意文本分隔的多个匹配

3 个答案: