Question

我正在尝试匹配< >内的字词。

这是匹配< >内的单词的正则表达式：

text = " Hi <how> is <everything> going"
pattern_neg =  r'<([A-Za-z0-9_\./\\-]*)>'
m = re.findall(pattern_neg, text)

# m is ['how', 'everything']

我希望结果为['Hi', 'is', 'going']。

Answer 1

使用re.split：

import re

text = " Hi <how> is <everything> going"
[s.strip() for s in re.split('\s*<.*?>\s*', text)]
>> ['Hi', 'is', 'going']

Answer 2

正则表达式方法：

>>> import re
>>> re.findall(r"\b(?<!<)\w+(?!>)\b", text)
['Hi', 'is', 'going']

\b是单词边界，(?<!<)是负面后瞻，(?!>)是否定前瞻，\w+会匹配一个或多个字母数字字符。

非正则表达式的天真方法（按空格分割，检查每个单词是否以<开头而不是以>结尾）：

>>> [word for word in text.split() if not word.startswith("<") and not word.endswith(">")]
['Hi', 'is', 'going']

为了处理<hello how> are you案例，我们需要一些不同的东西：

>>> text = " Hi <how> is <everything> going"
>>> re.findall(r"(?:^|\s)(?!<)([\w\s]+)(?!>)(?:\s|$)", text)
[' Hi', 'is', 'going']
>>> text = "<hello how> are you"
>>> re.findall(r"(?:^|\s)(?!<)([\w\s]+)(?!>)(?:\s|$)", text)
['are you']

请注意，are you现在必须拆分才能获得单词。

匹配不在字符内的单词＆lt; ＆GT;使用正则表达式

2 个答案: