Powershell不会读取word文档中的标题文本?

时间:2017-05-23 10:27:17

标签: powershell ms-word automation

我需要为特定文本检查大量的word文档(doc& docx),并找到了脚本专家的精彩教程和脚本;

https://blogs.technet.microsoft.com/heyscriptingguy/2012/08/01/find-all-word-documents-that-contain-a-specific-phrase/

脚本读取目录中的所有文档并提供以下输出;

  1. 提到的次数
  2. 找到特定文本的所有文档中的总字数
  3. 包含特定文本的所有文件的目录。
  4. 这就是我所需要的,但是他们的代码似乎并没有真正检查任何文档的标题,顺便提一下,我正在查找的特定文本所在的位置。任何提示&使脚本读取标题文本的技巧会让我非常高兴。

    另一种解决方案可能是删除格式,以便标题文本成为文档其余部分的一部分?这可能吗?

    编辑:忘记链接脚本:

    [cmdletBinding()]
    Param(
     $Path = "C:\Users\use\Desktop\"
    ) #end param
    
    $matchCase = $false
    $matchWholeWord = $true
    $matchWildCards = $false
    $matchSoundsLike = $false
    $matchAllWordForms = $false
    $forward = $true
    $wrap = 1
    $application = New-Object -comobject word.application
    $application.visible = $False
    $docs = Get-childitem -path $Path -Recurse -Include *.docx
    $findText = "specific text"
    $i = 1
    $totalwords = 0
    $totaldocs = 0
    
    Foreach ($doc in $docs)
    {
     Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100) 
     $document = $application.documents.open($doc.FullName)
     $range = $document.content
     $null = $range.movestart()
     $wordFound = $range.find.execute($findText,$matchCase,
      $matchWholeWord,$matchWildCards,$matchSoundsLike,
      $matchAllWordForms,$forward,$wrap)
      if($wordFound) 
        { 
         $doc.fullname
         $document.Words.count
         $totaldocs ++
         $totalwords += $document.Words.count
        } #end if $wordFound
     $document.close()
     $i++
    } #end foreach $doc
    $application.quit()
    "There are $totaldocs and $($totalwords.tostring('N')) words"
    
    #clean up stuff
    [System.Runtime.InteropServices.Marshal]::ReleaseComObject($range) | Out-Null
    [System.Runtime.InteropServices.Marshal]::ReleaseComObject($document) | Out-Null
    [System.Runtime.InteropServices.Marshal]::ReleaseComObject($application) | Out-Null
    Remove-Variable -Name application
    [gc]::collect()
    [gc]::WaitForPendingFinalizers()
    

    编辑2:我的同事决定调用节标题;

    Foreach ($doc in $docs)
    {
     Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100) 
     $document = $application.documents.open($doc.FullName)
     # Load first section of the document
     $section = $doc.sections.item(1);
     # Load header
     $header = $section.headers.Item(1);
    
     # Set the range to be searched to only Header
     $range = $header.content
     $null = $range.movestart()
    
     $wordFound = $range.find.execute($findText,$matchCase,
      $matchWholeWord,$matchWildCards,$matchSoundsLike,
      $matchAllWordForms,$forward,$wrap,$Format)
      if($wordFound) [script continues as above]
    

    但这会遇到以下错误:

    You cannot call a method on a null-valued expression.
    At C:\Users\user\Desktop\count_mod.ps1:27 char:31
    +  $section = $doc.sections.item <<<< (1);
        + CategoryInfo          : InvalidOperation: (item:String) [], RuntimeException
        + FullyQualifiedErrorId : InvokeMethodOnNull
    
    You cannot call a method on a null-valued expression.
    At C:\Users\user\Desktop\count_mod.ps1:29 char:33
    +  $header = $section.headers.Item <<<< (1);
        + CategoryInfo          : InvalidOperation: (Item:String) [], RuntimeException
        + FullyQualifiedErrorId : InvokeMethodOnNull
    
    You cannot call a method on a null-valued expression.
    At C:\Users\user\Desktop\count_mod.ps1:33 char:26
    +  $null = $range.movestart <<<< ()
        + CategoryInfo          : InvalidOperation: (movestart:String) [], RuntimeException
        + FullyQualifiedErrorId : InvokeMethodOnNull
    
    You cannot call a method on a null-valued expression.
    At C:\Users\user\Desktop\count_mod.ps1:35 char:34
    +  $wordFound = $range.find.execute <<<< ($findText,$matchCase,
        + CategoryInfo          : InvalidOperation: (execute:String) [], RuntimeException
        + FullyQualifiedErrorId : InvokeMethodOnNull
    

    这是正确的方法还是死路一条?

2 个答案:

答案 0 :(得分:1)

如果您需要标题文字,可以尝试以下方法:

$document.content.Sections.First.Headers.Item(1).range.text

答案 1 :(得分:0)

对于任何在将来看这个问题的人:某些东西并不适合我上面的代码。它似乎返回一个误报并放置$ wordFound = 1,无论文档的内容如何,​​因此列出了在$ path下找到的所有文档。

在Find.Execute中编辑变量似乎并没有改变$ wordFound的结果。我相信问题可能出现在我的$ range中,因为它是我在逐步完成代码时遇到错误的唯一地方。

列出的错误;

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\Powershell\count.ps1:24 char:58
+  $range = $document.content.Structures.First.Headers.Item <<<< (1).range.Text
    + CategoryInfo          : InvalidOperation: (Item:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

Exception calling "MoveStart" with "0" argument(s): "The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)"
At C:\Users\user\Desktop\Powershell\count.ps1:25 char:26
+  $null = $range.MoveStart <<<< ()
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : ComMethodCOMException

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\Powershell\count.ps1:26 char:34
+  $wordFound = $range.Find.Execute <<<< ($findText,$matchCase,
    + CategoryInfo          : InvalidOperation: (Execute:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull