Question

我正在尝试解析网页以查找指向特殊网页的链接

例如，如果我们将以下作为输入

flowers that never end.')" onmouseout="return nd();" href="/flowers/images/download/01d6ac.html"><img src="http://static.rarbg.com/over/01d6acc21110e68af7476bce50dec3c234343032.jpg" border="0

在另一页上有：

flowers that never end')" onmouseout="return nd();" href="/flowers/01d6acc21110e68af7476bce50dec3c234343032.html" src="http://static.rarbg.com/over/01d6acc21110e68af7476bce50dec3c234343032.jpg" border="0

我尝试使用以下重新获取链接：

'href="/flowers/(.+?)"[^>]

但它仍然从两个输入中获取链接，而不仅仅是第二个！有谁可以帮助我？

Answer 1

如果出于某种原因你使用正则表达式，那么最好使用这个表达式：

'href="/flowers/([^"]+)"[^>]'

然而，您的痛苦将持续到您使用解析器，因为您可以在评论中阅读。

帮助python正则表达式找到一个不以符号结尾的字符串

1 个答案: