Powershell返回字符串模式中找到的最高4位数 - 搜索Word文档

时间:2016-03-15 22:02:26

标签: regex powershell ms-word

我正在尝试在一组文档中返回字符串模式中找到的最高 4位数字。

字符串模式:3个字母短划线4位数

word文档中包含文档标识符代码,如下所示。

示例文件:

Car Parts.docx> CPW - 2345

CarHandles.docx> CPW - 8723

CarList.docx> CPA - 9083

我引用了我想要调整的示例代码。我不是VBA或PowerShell程序员 - 所以我可能错了我想做的事情?

我很高兴看到Windows平台上的替代方案。

我引用了这个来让我开始

http://chris-nullpayload.rhcloud.com/2012/07/find-and-replace-string-in-all-docx-files-recursively/

PowerShell: return the number of instances find in a file for a search pattern

Powershell: return filename with highest number

$list = gci "C:\Users\WP\Desktop\SearchFiles" -Include *.docx -Force -recurse
foreach ($foo in $list) {

$objWord = New-Object -ComObject word.application
$objWord.Visible = $False

$objDoc = $objWord.Documents.Open("$foo")
$objSelection = $objWord.Selection 

$Pat1 = [regex]'[A-Z]{3}-[0-9]{4}'   # Find the regex match 3 letters  followed by 4 numbers eg     HGW - 1024

$findtext= "$Pat1"

 $highestNumber = 

 # Find the highest occurrence of this pattern found in the documents searched - output to text file or on screen

Sort-Object |                   # This may also be wrong -I added it for when I find the pattern
Select-Object -Last 1 -ExpandProperty Name


<#   The below may not be needed  - ?

$ReplaceText = ""

$ReplaceAll = 2
$FindContinue = 1
$MatchFuzzy = $False
$MatchCase = $False
$MatchPhrase = $false
$MatchWholeWord = $True
$MatchWildcards = $True
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $FindContinue
$Format = $False

$objSelection.Find.execute(
    $FindText,
    $MatchCase,
    $MatchWholeWord,
    $MatchWildcards,
    $MatchSoundsLike,
    $MatchAllWordForms,
    $Forward,
    $Wrap,
    $Format,
    $ReplaceText,
    $ReplaceAll
  }

}
#>

我感谢任何有关如何进行的建议 -

2 个答案:

答案 0 :(得分:2)

试试这个:

template <typename EnumType> class LazyInitSingleton
{
public:
    static EnumWithString<EnumType>& getInstance(const EnumWithString<EnumType>::BMEnumType& arg)
    {
        static EnumWithString<EnumType> theInstance(arg);
        return theInstance;
    }
};

LazyInitSingleton<ColorType>::getInstance(...);

这背后的主要思想不是围绕Word的COM api,而是尝试手动从文档中提取文本信息。

答案 1 :(得分:0)

获得最高数字的方法是首先使用正则表达式将其隔离,然后排序并选择第一项。像这样:

[regex]::matches($objSelection, '(?<=[A-Z]{3}\s*-\s*)\d{4}')  `
  | Select -ExpandProperty captures `
  | sort value -Descending `
  | Select -First 1 -ExpandProperty value `
  | Add-Content outfile.txt

我认为您使用正则表达式时遇到的问题是,您的示例数据在代码中包含了短划线周围的空格,而这在您的模式中是不允许的。