考虑以下html
代码段:
...
<html>
<body>
<style>
<div>
<div class="foo">Attachments:</div>
<div class="bar">Name of the attachment (23 KB)</div>
...
如果html中存在<div class="foo">Attachments:</div>
,我需要匹配附件名称(可以有更多,所有bar
类,每个附件都在自己的div中)。我有匹配这个问题,因为:
(1)我无法获得新行匹配工作
(2)我无法匹配bar
div
$pattern = <div class="foo">Attachments:</div>\n^[ \t]+<div class="bar">(.*?)</div>
$matches = [regex]::matches($content, $pattern)
Write-Host ($matches[0])
所需匹配为Name of the attachment (23 KB)
。我做错了什么?
答案 0 :(得分:1)
如果你在这里使用字符串,那么多行正则表达式可以更容易构建(恕我直言)。新行成为字面匹配的一部分。
$Text =
@'
<html>
<body>
<style>
<div>
<div class="foo">Attachments:</div>
<div class="bar">Name of the attachment (23 KB)</div>
'@
$regex=
@'
(?ms)<html>
<body>
<style>
<div>
<div class="foo">Attachments:</div>
<div class="bar">(.+)</div>
'@
$text -match $regex > $null
$matches[1]
Name of the attachment (23 KB)
编辑:根据评论,您尝试从文本中提取多个实例,使用[regex]::matches()
静态方法:
$Text =
@'
<html>
<body>
<style>
<div>
<div class="foo">Attachments:</div>
<div class="bar">Name of the attachment (23 KB)</div>
....
<div class="foo">Attachments:</div>
<div class="bar">Name of the other attachment (23 KB)</div>
'@
$regex=
@'
(?ms) <div class="foo">Attachments:</div>
<div class="bar">(.+?)</div>
'@
[regex]::Matches($Text,$regex) |
foreach { $_.groups[1].value }
Name of the attachment (23 KB)
Name of the other attachment (23 KB)
答案 1 :(得分:0)
我要提出另一种方法。 您可以将HTML视为xml并过滤来自&#39;栏的数据。课程项目。
类似的东西:
PS>[xml]$h='<html><body><div><div class="foo">Attachments: </div><div class="bar">Name of the attachment (23 KB) </div></div></body></html>'
PS>$h.html.body.div.div | ?{ $_.class -eq 'bar'} |select -Expand "#text"
Name of the attachment (23 KB)
在您的评论后编辑:
[xml]$h=@"
<html><body><div>
<div class="foo">Attachments: </div>
<div class="bar">Name of the attachment (23 KB) </div>
<div class="bar">file2 (3 KB) </div>
<div class='test'>aa</div>
<div class="bar">sfdfsd</div>
<div class="bar">sdfsdf</div>
<div class="foo">Attachments: </div>
<div class="bar">fileB1 (2 KB) </div>
</div></body></html>
"@
$cpt=0
$res=New-Object System.Collections.Specialized.OrderedDictionary
#add each div to the ordered dictionary
$h.html.body.div.div |%{
$res.add($cpt,@{"class"=$_.class;"text"=$_.'#text'})
$cpt++
}
$lastClass=''
(0.. ($res.count-1))|%{
if($res[$_].class -ne 'bar' -and $res[$_].class -ne $lastClass){
$lastClass=$res[$_].class
$lastText=$res[$_].text
}
if($lastClass -eq 'foo' -and $lastText -eq 'Attachments: ' -and $res[$_].text -ne 'Attachments: ' -and $res[$_].class -eq 'bar' ){
$res[$_].text
}
}
输出:
PS>.\test.ps1
Name of the attachment (23 KB)
file2 (3 KB)
fileB1 (2 KB)