我试图在Python中使用正则表达式匹配所有连续的所有大写单词/短语。鉴于以下内容:
text = "The following words are ALL CAPS. The following word is in CAPS."
代码将返回:
ALL CAPS, CAPS
我目前正在使用:
matches = re.findall('[A-Z\s]+', text, re.DOTALL)
但是这会回来:
['T', ' ', ' ', ' ', ' ALL CAPS', ' T', ' ', ' ', ' ', ' ', ' CAPS']
我显然不想要标点符号或'T'。我想只返回连续的单词或只包含所有大写字母的单个单词。
由于
答案 0 :(得分:1)
你的正则表达式依赖于显式条件(字母后面的空格)。
matches = re.findall(r"([A-Z]+\s?[A-Z]+[^a-z0-9\W])",text)
如果没有尾随的小写字母或非字母字符,则捕获A到Z的重复。
答案 1 :(得分:1)
这个完成工作:
import re
text = "tHE following words aRe aLL CaPS. ThE following word Is in CAPS."
matches = re.findall(r"(\b(?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+)\b(?:\s+(?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+)\b)*)",text)
print matches
<强>输出:强>
['tHE', 'aLL CaPS', 'ThE', 'Is', 'CAPS']
<强>解释强>
( : start group 1
\b : word boundary
(?: : start non capture group
[A-Z]+ : 1 or more capitals
[a-z]? : 0 or 1 small letter
[A-Z]* : 0 or more capitals
| : OR
[A-Z]* : 0 or more capitals
[a-z]? : 0 or 1 small letter
[A-Z]+ : 1 or more capitals
) : end group
\b : word boundary
(?: : non capture group
\s+ : 1 or more spaces
(?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+) : same as above
\b : word boundary
)* : 0 or more time the non capture group
) : end group 1
答案 2 :(得分:1)
保持正则表达式,您可以使用strip()
和filter
:
string = "The following words are ALL CAPS. The following word is in CAPS."
result = filter(None, [x.strip() for x in re.findall(r"\b[A-Z\s]+\b", string)])
# ['ALL CAPS', 'CAPS']
答案 3 :(得分:0)
假设您想要以字母开头和结尾,并且只包含字母和空格
\b([A-Z][A-Z\s]*[A-Z]|[A-Z])\b
| [A-Z]只捕获我或A