我正在尝试使用正则表达式从以下html中提取带有查询字符串的网址,但它没有工作。你能帮帮我吗?
我想要匹配的内容:https://www.joinville.sc.gov.br/jornal/visualizaranexos?cod_jornal=755&cod_sei_publicacao=529
1.1我的正则表达方式:
(?<=href=").*?\?.*?(?=")
1.2此正则表达式的输出
https://www.joinville.sc.gov.br/public/portaladm/pdf/jornal/ed1301f83021029837bd0628e8e98d39.pdf\" target=\"_blank\"> <span class=\"thumb-jornal\"> <img src=\"/public/portal/imagens/ico_diario.png\" class=\"jornal-icon\" width=\"46\" height=\"38\" alt=\"\"> <span class=\"jornal-shadow\"></span> </span> </a> <span class=\"article-date bolder\"> <span class=\"article-subject\">ano 2016</span> <img src=\"/public/portal/imagens/arrow-bullet.gif\" width=\"8\" height=\"11\" class=\"arrow-bullet\" alt=\">\">n° 398 - <a rel=\"shadowbox;width=500;height=400\" href=\"https://www.joinville.sc.gov.br/jornal/visualizaranexos?cod_jornal=755&cod_sei_publicacao=529"
2。 HTML:
<li> <a href="https://www.joinville.sc.gov.br/public/portaladm/pdf/jornal/ed1301f83021029837bd0628e8e98d39.pdf" target="_blank"> <span class="thumb-jornal"> <img src="/public/portal/imagens/ico_diario.png" class="jornal-icon" width="46" height="38" alt=""> <span class="jornal-shadow"></span> </span> </a> <span class="article-date bolder"> <span class="article-subject">ano 2016</span> <img src="/public/portal/imagens/arrow-bullet.gif" width="8" height="11" class="arrow-bullet" alt=">">n° 398 - <a rel="shadowbox;width=500;height=400" href="https://www.joinville.sc.gov.br/jornal/visualizaranexos?cod_jornal=755&cod_sei_publicacao=529" style="font-size: 8px; display: inline; color: #ff0000;">anexos</a> </span> <span class="article-date">19/02/2016</span> </li>
编辑:以下正则表达式看起来正常 - &gt; ?(小于?= HREF = “)[^”] + \ [^ “] +(?=”)
答案 0 :(得分:1)
如果我做得对,你只对带参数的网址感兴趣吗?然后我认为这就是诀窍。
(?<=href=")([\S\?]*\?.*?)(?=")
答案 1 :(得分:0)