Question

我正在尝试使用power shell从多个文件中提取记录，我编写的脚本正在迭代每个文件并将匹配该模式的记录写入out文件。但是由于文件数量很大，这需要很长时间

我想知道这是否可以优化。

$files = Get-ChildItem $sourcedirectory\*

for ($i=0; $i -lt $files.Count; $i++) {
    $outfile = $files[$i].FullName + "_out" 
    Get-Content $files[$i].FullName| Select-String -Pattern "OB_[0-9]F_AHU*" | Set-Content $outfile
}


if (!(Test-Path -path $targetdirectory)) {New-Item $targetdirectory -Type Directory}
Move-Item -Path $sourcedirectory\*_out -Destination $targetdirectory

Answer 1

你能否发布一些关于你想要完成什么的更多细节？

从表面上看，这是一个解析并行解析每个文件的解决方案。我不确定它将使用多少并发作业，但这应该让你开始沿着这条道路前进。

试试这个：

$files = Get-ChildItem $sourcedirectory\*

foreach -parallel ($file in $files) {
    $outfile = $file.FullName + "_out" 
    Get-Content $file.FullName | Select-String -Pattern "OB_[0-9]F_AHU*" | out-file -Append $outfile
}

就您的总体目标而言，有时PowerShell不是最佳工具。每当您想要解析大量数据时，您应该考虑将该数据转储到数据库中。您可以使用类似SQL Express的内容并上传文件一次（慢速操作），然后能够从那时起以更快的速度快速解析数据。由于我不知道您要完成什么或者您的数据是什么样的，因此我无法在您的案例中了解这是否值得。

Answer 2

您可以将新文件直接写入目标目录，而不是将其从源目录中移除。

$sourceDir = "C:\users\you\documents\somefiles"
$targetDir = "C:\users\you\documents\somefiles\targetDir"

if( !(Test-Path $targetDir) ) {
    New-Item -Path $targetDir -ItemType d
}

( Get-ChildItem $sourceDir | Select-String -Pattern "OB_[0-9]F_AHU*" ) | 
    %{ New-Item -Path $targetDir -Name ($_.Filename + "_out") -Value $_.Line}

Select-String的输出将包含找到匹配项的FileName和Line，这就是在Foreach块％{}内部使用New-Item创建新文件所需的全部内容。

一个小改进。

使用PowerShell优化脚本以从多个文件中提取记录

2 个答案: