提高导入2个CSV文件,比较,分组以及将结果输出到日志的性能

时间:2018-10-26 21:01:06

标签: performance powershell refactoring powershell-v4.0

总体而言,我对Powershell还是很陌生,所以我想对如何提高代码性能提出一些建议。该脚本的目标是导入2个CSV文件,在它们之间进行比较,然后将结果输出到一个日志文件(或两个,取决于所选选项)。

我的原始脚本在这里工作正常,但速度不是很快。在我的测试环境中,导入,比较,分组和输出两个CSV文件中包含的所有440个文件的日志大约需要15秒。 15s并不多,但是我将针对600,000多个文件运行此脚本,因此速度越快越好。

#User variables
$logpath = "X:\Documents\Customer Projects"
$passlog = $true

$startt = (Get-Date)

# code to check to see if necessary files exist and/or are locked is here

$csv1 = Import-CSV $generated
$csv2 = Import-CSV $cloud
$comp = Compare-Object -ReferenceObject $csv2 -DifferenceObject $csv1 -Property Name,Size,Hash -PassThru
$group = $comp | Group-Object -Property Name
$failcount = ($group | Measure-Object).count
If ($passlog) {
    $comp = Compare-Object -ReferenceObject $csv2 -DifferenceObject $csv1 -IncludeEqual -Property Name,Size,Hash
    $group = $comp | Group-Object -Property Name
    $count = ($group | Measure-Object).count
}
Else {
    $count = $failcount
}
$curr = 0
Write-Output "($failcount) files failed verification. See logs and below for details:"
Write-Output "`n"
foreach ($file in $group)
{
    $source = $comp | Where-Object {($_.SideIndicator -eq "<=" -and $_.Name -eq $file.name)}
    $dest = $comp | Where-Object {($_.SideIndicator -eq "=>" -and $_.Name -eq $file.name)}
    $pass = $comp | Where-Object {($_.SideIndicator -eq "==" -and $_.Name -eq $file.name)}
    $name = $file.name
    $sourcesize = $source | Select -ExpandProperty Size
    $sourcehash = $source | Select -ExpandProperty Hash
    $destsize = $dest | Select -ExpandProperty Size
    $desthash = $dest | Select -ExpandProperty Hash
    $destpath = ($csv1 | Where-Object {$_.Name -eq $file.name} | Select -ExpandProperty DestPath)
    $countp = ($file | Select -ExpandProperty Count)
    $curr += 1
    $logger = $true
    if ($countp -ge 2 -and ($source) -and ($dest)) {
        Write-Host "($curr of $count) Hash mis-match            -"$file.name""
        $message = "Hash mis-match"
    }
    if ($countp -eq 1 -and ($pass)) {
        Write-Host "($curr of $count) Verification passed!      -"$file.name""
        $message = "Verification passed"
        $sourcesize = ($pass | Select -ExpandProperty Size)
        $sourcehash = ($pass | Select -ExpandProperty Hash)
        $destsize = $sourcesize
        $desthash = $sourcehash
        $logger = $false
    }
    if ($countp -eq 1 -and ($source)) {
        Write-Host "($curr of $count) Missing from destination  -"$file.name""
        $message = "File missing from destination"
    }
    if ($countp -eq 1 -and ($dest)) {
        Write-Host "($curr of $count) Missing from source       -"$file.name""
        $message = "File missing from source"
    }
    If ($logger) {
        $whichlog = $failed   
    }
    Else {
        $whichlog = $passed
    }
    "" | Select @{Name="Status";Expression={$message}},@{N="FileName";E={$name}},@{N="SourceSize";E={$sourcesize}},@{N="SourceHash";E={$sourcehash}},@{N="DestSize";E={$destsize}},@{N="DestinationHash";E={$desthash}},@{N="DestinationPath";E={$destpath}} | Export-Csv -Path $whichlog -Append -NoTypeInformation
}
$endt = (Get-Date)
$runt = ($endt - $startt)
$exectime = "$($runt.hours)h:$($runt.minutes)m:$($runt.seconds)s"
Write-Output "`n"
Write-Output "Completed in $exectime! Press any key to close...";

我已经阅读了有关使用哈希表而不是附加到文件或数组来提高性能的信息。我还没有弄清楚如何正确使用哈希表,但是在脚本末尾进行重构以一次全部写入CSV而不是在循环中附加每次运行都可以提高性能。使用以下更改将时间减少到10秒,这是一个不错的改进,但我觉得10秒仍然太长。

# same code before this point
Write-Host "Starting verification process. This may take some time..."
$array = foreach ($file in $group)
{
    $source = $comp | Where-Object {($_.SideIndicator -eq "<=" -and $_.Name -eq $file.name)}
    $dest = $comp | Where-Object {($_.SideIndicator -eq "=>" -and $_.Name -eq $file.name)}
    $pass = $comp | Where-Object {($_.SideIndicator -eq "==" -and $_.Name -eq $file.name)}
    $name = $file.name
    $sourcesize = $source.size
    $sourcehash = $source.hash
    $destsize = $dest.size
    $desthash = $dest.hash
    $destpath = ($csv1 | Where-Object {$_.Name -eq $file.name} | Select -ExpandProperty DestPath)
    $countp = ($file | Select -ExpandProperty Count)
    $curr += 1
    $logger = $true
    if ($countp -eq 1 -and ($pass)) {
        Write-Host "($curr of $count) Verification passed!      -"$file.name""
        $message = "Verification passed"
        $sourcesize = ($pass.size)
        $sourcehash = ($pass.hash)
        $destsize = $sourcesize
        $desthash = $sourcehash
    }
    if ($countp -ge 2 -and ($source) -and ($dest)) {
        #Write-Host "($curr of $count) Hash mis-match            -"$file.name""
        $message = "Hash mis-match"
    }
    if ($countp -eq 1 -and ($source)) {
        #Write-Host "($curr of $count) Missing from destination  -"$file.name""
        $message = "File missing from destination"
    }
    if ($countp -eq 1 -and ($dest)) {
        #Write-Host "($curr of $count) Missing from source       -"$file.name""
        $message = "File missing from source"
    }
    $file | Select @{N="Status";E={$message}},@{N="FileName";E={$name}},@{N="SourceSize";E={$sourcesize}},@{N="SourceHash";E={$sourcehash}},@{N="DestinationSize";E={$destsize}},@{N="DestinationHash";E={$desthash}},@{N="DestinationPath";E={$destpath}}
}
$array | Where-Object {$_.Status -ne "Verification passed"} | Export-Csv -Path $failed -NoTypeInformation
If ($passlog) {
    $array | Where-Object {$_.Status -eq "Verification passed"} | Export-Csv -Path $passed -NoTypeInformation
}
# same code after this point

我相信下一步的逻辑就是以某种方式将所有需要的信息收集到一个地方,以便ForEach循环可以快速查找所需的数据,而不必进行排序,过滤或进行其他不必要的计算。我确实尝试执行此操作,如下面的代码所示,然后运行了第二个循环以捕获字段并生成状态消息,但是此方法的性能比我的原始方法差(可能是因为第一个{{1 }},然后与原始图片进行比较。有什么建议吗?

ForEach

0 个答案:

没有答案