Question

我有一个类似2页长的文本文件，我需要编写一个正则表达式，它将提取以大写字母开头的单词。我想得到的一些例子（但不限于这些）是：

British Indian Ocean Territory
People's Republic of China Confederation of Independent States
French Southern and Antarctic Lands
Gilbert and Ellia Islands
Iraq-Saudia Arabia Neutral Zone
Juan de Nova Island 
St. Vincent and the Grenadines 
Trust Territory of the Pacific 
Washington, D.C.

我想出的正则表达式是：

"((?:[A-Z][a-z]+\\s){2,4}?) || ((?:[A-Z][a-z]+\\s){1,2}of(?:\\s[A-Z][a-z]+){1,2}) || ((?:[A-Z][a-z]+\\s){1,2}and(?:\\s[A-Z][a-z]+){1,2})"

Answer 1

使用此正则表达式

\b[A-Z].*?\b

http://rubular.com/r/HG7YJLgkc3

<强> REGEXPLANATION：

\b是一个单词边界。它匹配单词的开头和结尾
.匹配任何字符，
*匹配前一个字符0次或更多次，
?使之前的*非贪婪，因此它匹配尽可能少的字符而不是整个字符串

Answer 2

这个正则表达式应该有效：

"\\b(([A-Z]\\S*)|and|or|the)\\b"

Answer 3

这让你非常接近：

(\b[A-Z].*?\b('s|-|\.|,)?(\s((the|and|of|de)\s)*)?)+

请检查：http://rubular.com/r/5LpVm0oKtu

Answer 4

这会实现您的目标吗？ [A-Z]\S*\s

您使用的正则表达式可能会随着实现而略有改变，您可能必须使用一些标志来允许多行搜索和多个匹配。

正则表达式以获取以大写字母开头的单词

4 个答案: