正则表达式列出<a> tags

时间:2015-06-26 19:21:34

标签: html regex

I have the following HTML code:

<div class="tags">
<a href="/tag:SD_Card_Holder">SD_Card_Holder</a>
,
<a href="/tag:Thumb_Drive">thumb_drive</a>
</div>

I need to take only the content within <a> tags, in this case: SD_Card_Holder and thumb_drive.

My regex is the following:

(?s)class="tags">[^<]*?<a href="\/tag:(.*?)">(.*?)<\/a><\/div>

The result I get is:

SD_Card_Holder
SD_Card_Holder, thumb_drive

the second has the first occurrence and I need to avoid this.

How can I avoid this?

2 个答案:

答案 0 :(得分:1)

很清楚,除非您确定要使用的HTML,否则不应该使用正则表达式来解析xhtml。但是,如果你想使用正则表达式,你可以使用这样的正则表达式:

MATCH 1
1.  [33-47] `SD_Card_Holder`
MATCH 2
1.  [84-95] `Thumb_Drive`

<强> Working demo

匹配信息

$ sudo usermod -aG <iogroup> user

答案 1 :(得分:0)

首先:Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

第二:使用xpath选择器查找。

xmllint --xpath "string(//a[1])" foo.html

xmllint --xpath "string(//a[2])" foo.html

...