我想找到此应用程序的正则表达式。我已经搜索过但找不到答案,但是我不是正则表达式方面的专家。我将尝试解释我想做什么。我希望正则表达式在每个唯一字符串之前找到所有最后一个URL
我尝试了(?!href)。*(?<=包含唯一字符串的特殊字符),但使用实际的html时它会挂起程序,也许是因为它比我的示例长得多。
在此示例中,我想找到包含唯一字符串的特殊字符(可能有很多)之前的所有最后部分url。
就像下面的虚拟填充物一样,但是没有新行(添加了新的行以使您更容易理解我的意思),包括空格和特殊字符在内的随机垃圾也没有_-。,<>:;;;“ azAZ09实际上是href之间的随机填充物。我感兴趣的网址之间有不同数量的网址和随机垃圾:
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/theinfoIwant/moreinfoIwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
Uniquestringcontainingspecialcharacters
randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
href="/differentinfoIwant/moredifferentinfoIwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
Uniquestringcontainingspecialcharacters
randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
所以我想在这里得到
/theinfoIwant/moreinfoIwant/
/differentinfoIwant/moredifferentinfoIwant/
答案 0 :(得分:0)
基本上,您要查找的正则表达式可能类似于
href="[^"]*"(?=(?:(?!href=).)*Uniquestringcontainingspecialcharacters)
.
也与换行符匹配(取决于语言/s
标志)
href="[^"]*"
个匹配项
href="
后跟"
以外的任何字符都应尽可能多地跟在后面"
(?=...)
是结束"
后的位置的先行断言
(?:(?!href=).)*
是经过磨炼的贪婪令牌(使用否定的超前查找来尽可能匹配任何字符,以确保其中不包含href=
)Uniquestringcontainingspecialcharacters
特殊令牌稍微好一点的Uniquestringcontainingspecialcharacters
也可以以缓和的贪婪模式添加:
href="[^"]*"(?=(?:(?!href=|Uniquestringcontainingspecialcharacters).)*Uniquestringcontainingspecialcharacters)