列出Word Doc中找到的所有匹配项

时间:2016-10-14 15:38:18

标签: regex powershell

我能够在我正在搜索的每个文档中找到第一个匹配项,但是当有多个匹配项时,我无法列出每个文档中找到的所有匹配项。我已经尝试了多种迭代匹配哈希表的方法,但似乎无法正确。有没有办法做到这一点?

$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$WordFiles = Get-ChildItem $Directory -include *.doc, *.docx -recurse
$Directory = "c:\temp"
$objWord = New-Object -Com Word.Application

foreach ($fileSearched in $WordFiles) {
    $objWord.Visible = $false
    $objWord.DisplayAlerts = "wdAlertsNone"
    $objDocument = $objWord.Documents.Open("$fileSearched")
    if ($objdocument.Content.Text -match $RX){
        Foreach ($found in $_.Matches) { #| ForEach-Object {$_.Value}
            $file2.WriteLine("{0},{1}",$matches[$_], $filesearched.fullname)  
            write-host $_.matches
            write-host $_.value
            write-host $found
         }
   }
   $file2.close()
}
$objWord.Quit()

2 个答案:

答案 0 :(得分:0)

Powershell的-match正则表达式只返回第一场比赛,据我所知,没有办法让它找到全局比赛。

但是,您可以切换到使用默认情况下全局匹配的[regex]matches函数。

([regex]::matches($objdocument.Content.Text, $RX))

<强>更新

我相信您还需要按examples here$_.Matches切换为$_.Value

答案 1 :(得分:0)

我查看了cchamberlain提供的链接并提出:

$CSV = "c:\temp\output.csv"
$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$WordFiles = Get-ChildItem $Directory -include *.doc, *.docx -recurse
$Directory = "c:\temp"
$objWord = New-Object -Com Word.Application
$file2 = new-object System.IO.StreamWriter($CSV,$true) #Append or Create a new file Stream. 
$file2.WriteLine('Matches,File_Path') # write header

foreach ($fileSearched in $WordFiles) {
    $objWord.Visible = $false
    $objWord.DisplayAlerts = "wdAlertsNone"
    $objDocument = $objWord.Documents.Open("$fileSearched")
    $words = ([regex]::matches($objdocument.Content.Text,$RX) | %{$_.value})
        foreach ($word in $words){
            $file2.WriteLine("{0},{1}",$word, $filesearched.fullname)
        }
$file2.close()
$objWord.Quit()