Question

我有一个正则表达式来从推文中删除所有用户名。它看起来像这样：

regexFinder = "(?:\\s|\\A)[@]+([A-Za-z0-9-_]+):";

我试图了解每个组件的作用。到目前为止，我已经：

(       Used to begin a “group” element
?:      Starts non-capturing group (this means one that will be removed from the final result)
\\s     Matches against shorthand characters
|       or
\\A     Matches at the start of the string and matches a position as opposed to a character
[@]     Matches against this symbol (which is used for Twitter usernames)
+       Match the previous followed by
([A-Za-z0-9- ]  Match against any capital or small characters and numbers or hyphens

虽然最后一点我有点失落。有人能告诉我+）：意味着什么？我假设括号结束了组，但我没有得到冒号或加号。

如果我对正则表达式的理解有任何错误，请随时指出！

Answer 1

+实际上意味着它所遵循的“一个或多个”。

在这种情况下，[@]+表示“一个或多个@符号”，[A-Za-z0-9-_]+表示“字母，数字，短划线或下划线中的一个或多个”。 +是几个量词之一，learn more here。

最后的冒号只是确保比赛在比赛结束时有一个冒号。

有时可以看到可视化，这是debuggex生成的：

enter image description here

Answer 2

+符号表示“前一个字符可以重复1次或更多次”。这与*符号形成对比，这意味着“前一个字符可以重复 0 或更多次”。据我所知，冒号是字面的 - 它匹配字符串中的文字:。

Answer 3

正则表达式中的加号表示“前一个字符或一组字符出现一次或多次”。由于第二个加号位于第二组括号内，因此它基本上意味着第二组括号匹配由至少一个小写或大写字母，数字或连字符组成的任何字符串。

对于冒号，它在Java的正则表达式类中没有任何意义。如果您不确定，someone else已经找到了。

Answer 4

好吧，我们会看到..

[@]+                 any character of: '@' (1 or more times)
   (                 group and capture to \1:
    [A-Za-z0-9-_]+   any character of: (a-z A-Z), (0-9), '-', '_' (1 or more times)
   )                 end of capture group \1
   :                 look for and match ':'

以下量词得到承认：

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

理解正则表达式中的`+`

4 个答案: