我必须匹配两个<br>
内不是<p>
的直接元素的字符串。换句话说:
<p>hello world<br><br>goodbye world</p>
- 有效
<p>hello <span>world<br><br></span>goodbye world</p>
- 无效,应匹配
过了一段时间,我完成了这个:
<p>.*<(?:(?!br>)).*?<br><br>.*?<(?:(.*<\/p>))
接近完成这项工作,但它失败了,例如:
<p>abc<span>abc</span><br><br><span>abc</span></p>
- 应该有效
P.S。:这是为了匹配数据库中的行并手动修改它(我希望)。此外,我不是谁做出决定而我没有投票。
答案 0 :(得分:3)
正则表达式肯定不是完成此任务的正确工具。你可以试试some regex like this。
#p tag and stuff before a tag that includes <br><br>
<p>(?:(?!<\/?p)[\s\S])*?
#capture tag that's not a p tag
<(?!p)(\w+)
#capture tag only if it's not a singleton tag
(?=(?:(?!<\/?p)[\s\S])*?<\/\1)[^>]*>
#don't skip the current tag and find <br><br>
(?:(?!<\/?(?:p|\1))[\s\S])*<br><br>
#stuff until closing p
[\s\S]*?<\/p>
在JS中使用i caseless选项而没有注释。
<p>(?:(?!<\/?p)[\s\S])*?<(?!p)(\w+)(?=(?:(?!<\/?p)[\s\S])*?<\/\1)[^>]*>(?:(?!<\/?(?:p|\1))[\s\S])*<br><br>[\s\S]*?<\/p>
有关详细信息,请参阅有关regex101的说明,请注意,有一些回溯。
答案 1 :(得分:2)
我无法强调为什么不应该使用正则表达式来解决这个问题的原因。也许这个解决方案可以证明正则表达式方法的所有错误。
两个
的直接元素<br>
不是<p>
在JavaScript或VB.NET中
以下正则表达式适用于.net,并使用balancing groups验证任意数量的嵌套代码:
<p> # MAIN Opening <p>
(?>[^<]*) # any text
(?> # BEFORE <br><br>
[^<]+ # any text
| # or
< # TAGS
(?: # Options:
!--.*?--> # 1. comments
| #
\/?\s*(?:area|base|br|col # 2. self-closing tags
|embed|hr|img|input #
|keygen|link|meta|param #
|source|track|wbr #
)\b[^>]*\/?> #
| #
\s*(?<p>p\b) # 3. opening nested <p>
| #
/\s*(?<-p>p\b) # 4. closing nested <p>
| #
\s*(?<nestedtag> # 5.a) if inside a nested tag:
(?(nestedtag)\k<nestedtag> # another nested tag (same tag)
| #
[-:\w]+) # b) else: opening nested tag (except <p>)
\b) # *tag ends with word boundary
| #
/\s*(?<-nestedtag>\k<nestedtag>\b) # 6. closing nested tag
| #
(?!/\s*p\b) # 7. any other tag except <p> (inside nested tag)
) # end of Options
[^>]*> # end of TAGS before <br><br>
)*? # repeat as few as possible (BEFORE <br><br>)
(?(nestedtag)(?(p)(?!))|(?!)) # Conditions: unbalanced nested tags and balanced <p>
#
(?:<br>){2} # MATCH: <br><br>
#
(?>[^<]*) # AFTER <br><br> (any text)
(?> #
[^<]+ # any text
| # or
< # TAGS
(?: # Options:
(?<p>\s*p\b) # 1. opening nested <p>
| #
(?<-p>/\s*p\b) # 2. closing nested <p>
| #
(?!/\s*p\b) # 3. any other tag (except the main </p)
) # end of Options
[^>]*> # rest of tag
)* # repeat as much as possible (AFTER <br><br>)
(?(p)(?!)) # Conditions: balanced <p> tags
#
</\s*p\b[^>]*> # MAIN Closing </p>
vb.net代码
Dim pattern As String = "<p>(?>[^<]*)(?>[^<]+|<(?:!--.*?-->|/?\s*(?:area|base|br|col|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)\b[^>]*/?>|\s*(?<p>p\b)|/\s*(?<-p>p\b)|\s*(?<nestedtag>(?(nestedtag)\k<nestedtag>|[-:\w]+)\b)|/\s*(?<-nestedtag>\k<nestedtag>\b)|(?!/\s*p\b))[^>]*>)*?(?(nestedtag)(?(p)(?!))|(?!))(?:<br>){2}(?>[^<]*)(?>[^<]+|<(?:(?<p>\s*p\b)|(?<-p>/\s*p\b)|(?!/\s*p\b))[^>]*>)*(?(p)(?!))</\s*p\b[^>]*>"
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim m As Match = r.Match(subject)
Dim matchCount as Integer = 0
Do While m.Success
matchCount += 1
Console.WriteLine("Match " & matchCount & ": " & m.Groups(0).ToString())
m = m.NextMatch()
Loop
<强>输出继电器强>
Match 1: <p>hello <span>world<br><br></span>goodbye world</p>
Match 2: <p><p>xxx</p><span><br><br></span></p>
Match 3: <p><span><span>xxx</span><br><br></span></p>
Match 4: <p>asdf<span>asdf<br><br>asdf</span><br><br></p>
Match 5: <p><span>acb<br><br></span>abcd</p>
Match 6: <p>asdf<span>abc<br><br></span></p>
Match 7: <p><STRONG>Cetárea Duromar</STRONG> es una empresa familiar con más de 20 años de experiencia al
servicio de la restauración y el particular <STRONG>brindando siempre la mejor calidad en mariscos
y un esmerado servicio.<BR><BR></STRONG>Hemos sabido adaptarnos a los nuevos tiempos, incorporando
la mejor tecnología, controlando la calidad de nuestro producto, pero sobre todo exigiéndonos a
nosotros mismos ser superiores cada día para poner lo mejor de nuestro mar en su mesa.<BR><BR>Les
ofrecemos una muy <STRONG>cuidada selección del mejor marisco de la ría, de excelente calidad</STRONG>
y con una presentación extraordinaria.<BR><BR>Producto 100% garantizado.</p>