The <a>
tag like this:
<a data-hash="9aab8aa3af7dc519c643fdcfd973b040" href="http://www.zhihu.com/people/9aab8aa3af7dc519c643fdcfd973b040" class="member_mention" data-editable="true" data-title="@somebody" data-tip="p$b$9aab8aa3af7dc519c643fdcfd973b040">@somebody</a>
and I want get the url
href="http://www.zhihu.com/people/9aab8aa3af7dc519c643fdcfd973b040"
and the @somebody
at the same time
I have tried like this:
href=\"(.*?)\">(.*?)</a>
The result is:
href="http://www.zhihu.com/people/9aab8aa3af7dc519c643fdcfd973b040" class="member_mention" data-editable="true" data-title="@somebody" data-tip="p$b$9aab8aa3af7dc519c643fdcfd973b040">@somebody</a>
Is anyone can give me some suggestion?
答案 0 :(得分:0)
您还需要匹配锚标记内的额外参数。
"<a\\b[^>]*\\bhref=\"(.*?)\"[^>]*>(.*?)</a>"
或
"<a\\b[^>]*\\bhref=\"([^"]*)\"[^>]*>(.*?)</a>"
然后从组索引1和2中获取所需的字符串。您的正则表达式匹配以下所有字符(,即href属性旁边的字符),因为它查找>
符号就在双引号之后。因此它在data-tip="p$b$9aab8aa3af7dc519c643fdcfd973b040">
中找到匹配,因此该部分也会被第一个捕获组捕获。