Question

I have the following HTML code:

<div class="tags">
<a href="/tag:SD_Card_Holder">SD_Card_Holder</a>
,
<a href="/tag:Thumb_Drive">thumb_drive</a>
</div>

I need to take only the content within <a> tags, in this case: SD_Card_Holder and thumb_drive.

My regex is the following:

(?s)class="tags">[^<]*?<a href="\/tag:(.*?)">(.*?)<\/a><\/div>

The result I get is:

SD_Card_Holder
SD_Card_Holder, thumb_drive

the second has the first occurrence and I need to avoid this.

How can I avoid this?

Answer 1

很清楚，除非您确定要使用的HTML，否则不应该使用正则表达式来解析xhtml。但是，如果你想使用正则表达式，你可以使用这样的正则表达式：

MATCH 1
1.  [33-47] `SD_Card_Holder`
MATCH 2
1.  [84-95] `Thumb_Drive`

匹配信息

$ sudo usermod -aG <iogroup> user

Answer 2

第二：使用xpath选择器查找。

xmllint --xpath "string(//a[1])" foo.html

xmllint --xpath "string(//a[2])" foo.html

...