我需要帮助您将以下多标记字符串与<eyn>
和<un>
以及<an>
Your colleague <eyn id='test@test.com'>user</eyn> is now communicating with <un id='test@test.com'>user</un> from <an id='4442729'>test, Inc.</an>
答案 0 :(得分:0)
由于可能出现的所有可能模糊的边缘情况,使用正则表达式解析HTML是不明智的,但似乎您可以控制HTML,因此您应该能够避免使用许多边缘情况regex police哭了。
我可能想要在一个操作中收集打开和关闭标记之间的整个标记,ID值和原始文本。
此正则表达式
<(eyn|un|an)\b(?=\s)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\bid=('[^']*'|"[^"]*"|[^'"\s>]*))(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?\/?>(.*?)<\/\w+>
将执行以下操作
eyn
,un
,an
标记'
或"
示例文字
请注意嵌套在第二个文本块中的困难边缘情况。
Your colleague <eyn id='test@test.com'>user</eyn> is now communicating with <un id='test@test.com'>user</un> from <an id='4442729'>test, Inc.</an>
Your colleague <eyn onmouseover=' if ( 3 > a ) { var
string=" <eyn id=NotTheDroidYouAreLookingFor>R2D2</eyn>; "; } '
id='DesiredDroids'>This is the droid I'm looking for</eyn> is now communicating with <un id="test@test.com">user</un> from <an id=4442729>test, Inc.</an>
样本匹配
Match 1
Full match 15-49 `<eyn id='test@test.com'>user</eyn>`
Group 1. 16-19 `eyn`
Group 2. 23-38 `'test@test.com'`
Group 3. 39-43 `user`
Match 2
Full match 76-108 `<un id='test@test.com'>user</un>`
Group 1. 77-79 `un`
Group 2. 83-98 `'test@test.com'`
Group 3. 99-103 `user`
Match 3
Full match 114-146 `<an id='4442729'>test, Inc.</an>`
Group 1. 115-117 `an`
Group 2. 121-130 `'4442729'`
Group 3. 131-141 `test, Inc.`
Match 4
Full match 163-326 `<eyn onmouseover=' if ( 3 > a ) { var
string=" <eyn id=NotTheDroidYouAreLookingFor>R2D2</eyn>; "; } '
id='DesiredDroids'>This is the droid I'm looking for</eyn>`
Group 1. 164-167 `eyn`
Group 2. 271-286 `'DesiredDroids'`
Group 3. 287-320 `This is the droid I'm looking for`
Match 5
Full match 353-385 `<un id="test@test.com">user</un>`
Group 1. 354-356 `un`
Group 2. 360-375 `"test@test.com"`
Group 3. 376-380 `user`
Match 6
Full match 391-421 `<an id=4442729>test, Inc.</an>`
Group 1. 392-394 `an`
Group 2. 398-411 `4442729`
Group 3. 406-416 `test, Inc.`
NODE EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
eyn 'eyn'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
un 'un'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
an 'an'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
[^>=] any character except: '>', '='
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
=' '=\''
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
=" '="'
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
[^'"] any character except: ''', '"'
--------------------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
id= 'id='
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[^'"\s>]* any character except: ''', '"',
whitespace (\n, \r, \t, \f, and " "),
'>' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^>=] any character except: '>', '='
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
=' '=\''
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
=" '="'
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
[^'"\s]* any character except: ''', '"',
whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'