如何在Powershell正则表达式中匹配换行符和七个空格?

时间:2015-01-13 10:13:29

标签: regex powershell

考虑以下html代码段:

...
<html>
  <body>
    <style>
      <div>    
        <div class="foo">Attachments:</div>
        <div class="bar">Name of the attachment (23 KB)</div>
...

如果html中存在<div class="foo">Attachments:</div>,我需要匹配附件名称(可以有更多,所有bar类,每个附件都在自己的div中)。我有匹配这个问题,因为:

(1)我无法获得新行匹配工作

(2)我无法匹配bar div

之前的8个前导空格
$pattern = <div class="foo">Attachments:</div>\n^[ \t]+<div class="bar">(.*?)</div>
$matches = [regex]::matches($content, $pattern)

Write-Host ($matches[0])

所需匹配为Name of the attachment (23 KB)。我做错了什么?

2 个答案:

答案 0 :(得分:1)

如果你在这里使用字符串,那么多行正则表达式可以更容易构建(恕我​​直言)。新行成为字面匹配的一部分。

$Text = 
@'
<html>
  <body>
    <style>
      <div>    
        <div class="foo">Attachments:</div>
        <div class="bar">Name of the attachment (23 KB)</div>
'@

$regex= 
@'
(?ms)<html>
  <body>
    <style>
      <div>    
        <div class="foo">Attachments:</div>
        <div class="bar">(.+)</div>
'@

$text -match $regex > $null

$matches[1]

Name of the attachment (23 KB)

编辑:根据评论,您尝试从文本中提取多个实例,使用[regex]::matches()静态方法:

$Text = 
@'
<html>
  <body>
    <style>
      <div>    
        <div class="foo">Attachments:</div>
        <div class="bar">Name of the attachment (23 KB)</div>
....
        <div class="foo">Attachments:</div>
        <div class="bar">Name of the other attachment (23 KB)</div>
'@

$regex= 
@'
(?ms)   <div class="foo">Attachments:</div>
        <div class="bar">(.+?)</div>
'@

[regex]::Matches($Text,$regex) |
 foreach { $_.groups[1].value }

Name of the attachment (23 KB)
Name of the other attachment (23 KB)

答案 1 :(得分:0)

我要提出另一种方法。 您可以将HTML视为xml并过滤来自&#39;栏的数据。课程项目。

类似的东西:

PS>[xml]$h='<html><body><div><div class="foo">Attachments: </div><div class="bar">Name of the attachment (23 KB) </div></div></body></html>'
PS>$h.html.body.div.div | ?{ $_.class -eq 'bar'} |select -Expand "#text" 
Name of the attachment (23 KB)           

在您的评论后编辑:

[xml]$h=@"
<html><body><div>
<div class="foo">Attachments: </div>
<div class="bar">Name of the attachment (23 KB) </div>
<div class="bar">file2 (3 KB) </div>
<div class='test'>aa</div>
<div class="bar">sfdfsd</div>
<div class="bar">sdfsdf</div>
<div class="foo">Attachments: </div>
<div class="bar">fileB1 (2 KB) </div>
</div></body></html>
"@

$cpt=0
$res=New-Object System.Collections.Specialized.OrderedDictionary

#add each div to the ordered dictionary
$h.html.body.div.div |%{
        $res.add($cpt,@{"class"=$_.class;"text"=$_.'#text'})
$cpt++
}

$lastClass=''
(0.. ($res.count-1))|%{
    if($res[$_].class -ne 'bar' -and $res[$_].class -ne $lastClass){
    $lastClass=$res[$_].class
    $lastText=$res[$_].text
    }

    if($lastClass -eq 'foo' -and $lastText -eq 'Attachments: ' -and $res[$_].text -ne 'Attachments: '  -and $res[$_].class -eq 'bar' ){
        $res[$_].text 
    }
}   

输出:

PS>.\test.ps1                   
Name of the attachment (23 KB)  
file2 (3 KB)                    
fileB1 (2 KB)