正则表达式会在每个唯一字符串之前找到所有最后变化的字符串

时间:2019-01-15 07:47:21

标签: regex

我想找到此应用程序的正则表达式。我已经搜索过但找不到答案,但是我不是正则表达式方面的专家。我将尝试解释我想做什么。我希望正则表达式在每个唯一字符串之前找到所有最后一个URL

我尝试了(?!href)。*(?<=包含唯一字符串的特殊字符),但使用实际的html时它会挂起程序,也许是因为它比我的示例长得多。

在此示例中,我想找到包含唯一字符串的特殊字符(可能有很多)之前的所有最后部分url。

就像下面的虚拟填充物一样,但是没有新行(添加了新的行以使您更容易理解我的意思),包括空格和特殊字符在内的随机垃圾也没有_-。,<>:;;;“ azAZ09实际上是href之间的随机填充物。我感兴趣的网址之间有不同数量的网址和随机垃圾:

href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/theinfoIwant/moreinfoIwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
Uniquestringcontainingspecialcharacters
randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/differentinfoIwant/moredifferentinfoIwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
Uniquestringcontainingspecialcharacters
randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 

所以我想在这里得到

/theinfoIwant/moreinfoIwant/
/differentinfoIwant/moredifferentinfoIwant/

1 个答案:

答案 0 :(得分:0)

基本上,您要查找的正则表达式可能类似于

 href="[^"]*"(?=(?:(?!href=).)*Uniquestringcontainingspecialcharacters)

.也与换行符匹配(取决于语言/s标志)

  • href="[^"]*"个匹配项
    • href="后跟
    • "以外的任何字符都应尽可能多地跟在后面
    • "
  • (?=...)是结束"后的位置的先行断言
    • (?:(?!href=).)*是经过磨炼的贪婪令牌(使用否定的超前查找来尽可能匹配任何字符,以确保其中不包含href=
    • Uniquestringcontainingspecialcharacters特殊令牌

稍微好一点的Uniquestringcontainingspecialcharacters也可以以缓和的贪婪模式添加:

href="[^"]*"(?=(?:(?!href=|Uniquestringcontainingspecialcharacters).)*Uniquestringcontainingspecialcharacters)