我只是需要有人来纠正我对这个正则表达式的理解,这就像是一个匹配HTML标签的权宜之计。
< (?: "[^"]*" ['"]* | '[^']*'['"]*|[^'">])+ >
我的理解 -
<
- 匹配标记打开符号(?:
- 无法理解这里发生了什么。这些符号意味着什么?"[^"]*['"]*
双引号中的任意字符串。还有什么东西要来吗?'[^']*'['"]*
- 单引号中的一些字符串[^'">]
- 除“”&gt;。所以它是'&lt;'符号。用双引号或单引号中的字符串或任何其他包含'“或&gt;的字符串,重复一次或多次,后跟'&gt;' 。
这是我能做到的最好的。
答案 0 :(得分:5)
< # literally just an opening tag followed by a space
( # the bracket opens a subpattern, it's necessary as a boundary for
# the | later on
?: # makes the just opened subpattern non-capturing (so you can't access it
# as a separate match later
" # literally "
[^"] # any character but " (this is called a character class)
* # arbitrarily many of those (as much as possible)
" # literally "
['"] # either ' or "
* # arbitrarily many of those (and possible alternating! it doesn't have
# to be the same character for the whole string)
| # OR
' # literral *
[^'] # any character but ' (this is called a character class)
* # arbitrarily many of those (as much as possible)
' # literally "
['"]* # as above
| # OR
[^'">] # any character but ', ", >
) # closes the subpattern
+ # arbitrarily many repetitions but at least once
> # closing tag
请注意,正则表达式中的所有空格都被视为与任何其他字符一样。它们恰好匹配一个空间。
还要特别注意字符类开头的^
。它不被视为一个单独的字符,而是反转整个字符类。
我也可以(强制性地)添加正则表达式are not appropriate to parse HTML.
答案 1 :(得分:2)
将|
分开,表示or
s:
<
(?:
"[^"]*" ['"]* |
'[^']*'['"]* |
[^'">]
)+
>
(?:
表示不匹配的组。该组的内部匹配这些内容(按此顺序):
"stuff"
'stuff'
asd=
实际上,这是一个试图将HTML标记与属性匹配的正则表达式。
答案 2 :(得分:0)
以下是YAPE :: Regex :: Explain
的结果(?-imsx:< (?: "[^"]*" ['"]* | '[^']*'['"]*|[^'">])+ >)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
< '< '
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
" ' "'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '" '
----------------------------------------------------------------------
['"]* any character of: ''', '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
' ' \''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
['"]* any character of: ''', '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^'">] any character except: ''', '"', '>'
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
> ' >'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------