I have the following HTML code:
<div class="tags">
<a href="/tag:SD_Card_Holder">SD_Card_Holder</a>
,
<a href="/tag:Thumb_Drive">thumb_drive</a>
</div>
I need to take only the content within <a>
tags, in this case: SD_Card_Holder
and thumb_drive
.
My regex
is the following:
(?s)class="tags">[^<]*?<a href="\/tag:(.*?)">(.*?)<\/a><\/div>
The result I get is:
SD_Card_Holder
SD_Card_Holder, thumb_drive
the second has the first occurrence and I need to avoid this.
How can I avoid this?
答案 0 :(得分:1)
很清楚,除非您确定要使用的HTML,否则不应该使用正则表达式来解析xhtml。但是,如果你想使用正则表达式,你可以使用这样的正则表达式:
MATCH 1
1. [33-47] `SD_Card_Holder`
MATCH 2
1. [84-95] `Thumb_Drive`
<强> Working demo 强>
匹配信息
$ sudo usermod -aG <iogroup> user
答案 1 :(得分:0)
首先:Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
第二:使用xpath选择器查找。
xmllint --xpath "string(//a[1])" foo.html
xmllint --xpath "string(//a[2])" foo.html
...