Question

我是regex的新手。我在regularexperssion.com上学习它。问题是我需要知道正则表达式中冒号（:)的用法是什么..

例如......：

$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&amp;?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';

匹配

$url1  = "http://www.somewebsite.com";
$url2  = "https://www.somewebsite.com";
$url3  = "https://somewebsite.com";
$url4  = "www.somewebsite.com";
$url5  = "somewebsite.com";

是的，任何帮助都会受到高度赞赏.. :)

Answer 1

冒号:只是冒号。它没有任何意义，除了特殊情况，例如，没有捕获的聚类（也称为非捕获组）：

(?:pattern)

它也可用于字符类，例如：

[[:upper:]]

但是，在你的情况下，冒号只是一个冒号。

正则表达式中使用的特殊字符：

在角色等级[-+_~.\d\w]中：

-表示-
+表示+
_表示_
~表示~
.表示.
\d表示任何数字
\w表示任何字符

这些符号具有此含义，因为它们用于符号类[]。没有符号类+和.具有特殊含义。

其他元素：

=?表示可以发生0次或1次的=;换句话说，=可以出现，可选=。

Answer 2

我决定更好地帮助你解释整个正则表达式：

^                 # anchor to start of line
(                 # start grouping
 (                # start grouping
  [\w]+           # at least one of 0-9a-zA-Z_
  :               # a literal colon
 )                # end grouping
 ?                # this grouping is optional
 \/\/             # two literal slashes
)                 # end capture
?                 # this grouping is optional
(
 (
  [\d\w]          # exactly one of 0-9a-zA-Z_
                  # having \d is redundant
  |               # alternation
  %               # literal % sign
  [a-fA-f\d]{2,2} # exactly 2 hexadecimal digits
                  # should probably be A-F
                  # using {2} would have sufficed
 )+               # at least one of this groups
 (                # start grouping
  :               # literal colon
  (
   [\d\w]
   |
   %
   [a-fA-f\d]{2,2}
  )+
 )?               # Same grouping, but it is optional
                  # and there can be only one
 @                # literal @ sign
)?                # this group is optional
(
 [\d\w]           # same as [\w], explained above
 [-\d\w]{0,253}   # includes a dash as a valid character
                  # between 0 and 253 of these characters
 [\d\w]           # end with \w.  They want at most 255
                  # total and - cannot be at the start
                  # or end
 \.               # literal period
)+                # at least one of these groups
[\w]{2,4}         # two to four \w characters
(
 :                # literal colon
 [\d]+            # at least one digit
)?
(
 \/               # literal slash
 (
  [-+_~.\d\w]    # one of these characters
  |              # *or*
  %              # % with two hex digit combo
  [a-fA-f\d]{2,2}
 )*              # zero or more of these groups
)*               # zero or more of these groups
(
 \?              # literal question mark
 (
  &amp;?         # literal &amp or &amp;
  (
   [-+_~.\d\w]
   |
   %
   [a-fA-f\d]{2,2}
  )
  =?             # optional literal =
 )*              # zero or more of this group
)?               # this group is optional
(
 #               # literal #
 (
  [-+_~.\d\w]
  |
  %
  [a-fA-f\d]{2,2}
 )*
)?
$                # anchor to end of line

了解元字符/序列是什么很重要。在某些上下文（尤其是字符类）中使用时，某些序列 not 元。我已经为你编目了：

没有上下文的

元

^ - 零行宽
() - 分组/捕获
? - 零个或前一个序列之一
+ - 前面一个或多个序列
* - 前面序列中的零个或多个
[] - 角色等级
\w - 字母数字字符和_。与\W
| - 更改
{} - 长度断言
$ - 行宽零行

这会使:，@和%排除在原始上下文中具有任何特殊/元含义。

字符类中的

元
`]`结束了角色类。 `-`创建一系列字符，除非它位于字符类的开头或结尾。

分组断言

(?组合开始分组断言。例如，(?:表示组但不捕获。这意味着在正则表达式/(?:a)/中，它将匹配字符串"a"，但不会捕获a以用于替换或匹配组，因为它来自/(a)/。

?也可用于?=，?!，?<=，?<!的前瞻/后瞻断言。 (?后跟任何序列，除了我在本节中提到的只是文字?。

Answer 3

在您的情况下，冒号:没有特殊用途：

(([\w]+:)?\/\/)?将匹配http://，https://，ftp:// ......

您可以找到冒号的一个特殊用途：从(?:开始的每个捕获组都不会出现在结果中。
例如，输入中有“foobarbaz”：

/foo((bar)(baz))/ =＆gt; { [1] => 'barbaz', [2] => 'bar', [3] => 'baz' }
/foo(?:(bar)(baz))/ =＆gt; { [1] => 'bar', [2] => 'baz' }

Answer 4

冒号在正则表达式中没有特殊含义，它只与文字冒号匹配。

[\w]+:

这只意味着any word character 1 or more times followed by a literal colon 这里实际上不需要括号。方括号用于定义要匹配的一组字符。所以

[abcd]

表示a single character of a, b, c, d

在正则表达式中使用冒号

4 个答案:

元

元
`]`结束了角色类。 `-`创建一系列字符，除非它位于字符类的开头或结尾。

分组断言

在正则表达式中使用冒号

4 个答案:

元

元 ]结束了角色类。 -创建一系列字符，除非它位于字符类的开头或结尾。

分组断言

元
`]`结束了角色类。 `-`创建一系列字符，除非它位于字符类的开头或结尾。