我是一个正则表达式/ powershell初学者,并努力让这个工作。我正在使用一些HTML数据,我需要能够提取给定字符之间的字符串。在下面的例子中,我需要提取字符串>之间的字符串(如果它匹配我的搜索字符串)和< 。我在这里提供了多个例子,我希望我能清楚地提出问题。任何帮助是极大的赞赏。
例如 -
$string1 = '<P><STRONG><SPAN style="COLOR: rgb(255,0,0)">ILOM 2.6.1.6.a <BR>BIOS vers. 0CDAN860 <BR>LSI MPT SAS firmware MPT BIOS 1.6.00</SPAN></STRONG></P></DIV></TD>'
$string2 = '<P><A id=T5220 name=T5220></A><A href="http://mywebserver/index.html">Enterprise T5120 Server</A> <BR><A href="http://mywebserver/index.html">Enterprise T5220 Server</A></P></DIV></TD>'
$searchstring = "ILOM"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string1 -match $regex
$matches[x] = ILOM 2.6.1.6.a # expected result
同样 -
$searchstring = "BIOS"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string1 -match $regex
$matches[x] = BIOS vers. 0CDAN860 # expected result
$searchstring = "T5120"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string2 -match $regex
$matches[x] = Enterprise T5120 Server # expected result
$searchstring = "T5220"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string2 -match $regex
$matches[x] = Enterprise T5220 Server # expected result
答案 0 :(得分:1)
您需要在&#34;通配符&#34;上添加惰性?
运算符(?限定符?)。在你的搜索字符串之后,它会在<
的第一次出现时停止。
.*< = Any character as many as possible until an <
.*?< = Any character until first <
我会在&#34;通配符&#34;上使用惰性运算符。在你的搜索字符串之前,即使在这种特殊情况下没有必要,也只是为了安全。
最低要求的修改:
".+>(.*$searchstring.+?)<"
我建议:
".+>(.*?$searchstring.+?)<"
样品:
$string1 = '<P><STRONG><SPAN style="COLOR: rgb(255,0,0)">ILOM 2.6.1.6.a <BR>BIOS vers. 0CDAN860 <BR>LSI MPT SAS firmware MPT BIOS 1.6.00</SPAN></STRONG></P></DIV></TD>'
$string2 = '<P><A id=T5220 name=T5220></A><A href="http://mywebserver/index.html">Enterprise T5120 Server</A> <BR><A href="http://mywebserver/index.html">Enterprise T5220 Server</A></P></DIV></TD>'
$searchstring = "ILOM"
$regex = ".+>(.*?$searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }
#Custom regex
$searchstring = "BIOS"
$regex = ".+>($searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }
#Or the original regex with different search string
$searchstring = "BIOS vers"
$regex = ".+>(.*?$searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }
$searchstring = "T5120"
$regex = ".+>(.*?$searchstring.+?)<"
if($string2 -match $regex) { $matches[1] }
$searchstring = "T5220"
$regex = ".+>(.*?$searchstring.+?)<"
if($string2 -match $regex) { $matches[1] }
输出:
ILOM 2.6.1.6.a
BIOS vers. 0CDAN860
BIOS vers. 0CDAN860
Enterprise T5120 Server
Enterprise T5220 Server