我目前正在开发PowerShell脚本,该脚本将作为构建步骤的一部分在TeamCity中使用。该脚本必须:
我对PowerShell脚本完全不熟悉,但到目前为止,我已经做了一些符合我期望的事情:
Write-Host "Start checking for Unicorn serialization errors."
$files = get-childitem "%system.teamcity.build.workingDir%\Sitecore\serialization" -recurse -include *.item | where {! $_.PSIsContainer} | % { $_.FullName }
$arrayOfItemIds = @()
$NrOfFiles = $files.Length
[bool] $FoundDuplicates = 0
Write-Host "There are $NrOfFiles Unicorn item files to check."
foreach ($file in $files)
{
$thirdLineOfFile = (Get-Content $file)[2 .. 2]
if ($arrayOfItemIds -contains $thirdLineOfFile)
{
$FoundDuplicates = 1
$itemId = $thirdLineOfFile.Split(":")[1].Trim()
Write-Host "Duplicate item ID found!"
Write-Host "Item file path: $file"
Write-Host "Detected duplicate ID: $itemId"
Write-Host "-------------"
Write-Host ""
}
else
{
$arrayOfItemIds += $thirdLineOfFile
}
}
if ($foundDuplicates)
{
"##teamcity[buildStatus status='FAILURE' text='One or more duplicate ID's were detected in Sitecore serialised items. Check the build log to see which files and ID's are involved.']"
exit 1
}
Write-Host "End script checking for Unicorn serialization errors."
问题是:它很慢!此脚本必须检查的文件夹当前包含超过14.000个.item文件,并且该数量很可能仅在将来继续增加。我知道打开和阅读这么多文件是一项广泛的操作,但我并不认为它需要大约半小时才能完成。这太长了,因为这意味着每个(快照)构建的构建时间将延长半个小时,这是不可接受的。我原本希望剧本能在最短的几分钟内完成。
我无法相信没有更快的方法来做到这一点......所以非常感谢这方面的任何帮助!
解决方案
我必须说到目前为止我收到的所有3个答案都帮助了我。我首先开始直接使用.NET框架类,然后使用字典来解决不断增长的数组问题。运行我自己的脚本所花费的时间大约是30分钟,然后使用.NET框架类只需要2分钟。使用词典解决方案后,它也只有6或7秒!我使用的最终脚本:
Write-Host "Start checking for Unicorn serialization errors."
[String[]] $allFilePaths = [System.IO.Directory]::GetFiles("%system.teamcity.build.workingDir%\Sitecore\serialization", "*.item", "AllDirectories")
$IdsProcessed = New-Object 'system.collections.generic.dictionary[string,string]'
[bool] $FoundDuplicates = 0
$NrOfFiles = $allFilePaths.Length
Write-Host "There are $NrOfFiles Unicorn item files to check."
Write-Host ""
foreach ($filePath in $allFilePaths)
{
[System.IO.StreamReader] $sr = [System.IO.File]::OpenText($filePath)
$unused1 = $sr.ReadLine() #read the first unused line
$unused2 = $sr.ReadLine() #read the second unused line
[string]$thirdLineOfFile = $sr.ReadLine()
$sr.Close()
if ($IdsProcessed.ContainsKey($thirdLineOfFile))
{
$FoundDuplicates = 1
$itemId = $thirdLineOfFile.Split(":")[1].Trim()
$otherFileWithSameId = $IdsProcessed[$thirdLineOfFile]
Write-Host "---------------"
Write-Host "Duplicate item ID found!"
Write-Host "Detected duplicate ID: $itemId"
Write-Host "Item file path 1: $filePath"
Write-Host "Item file path 2: $otherFileWithSameId"
Write-Host "---------------"
Write-Host ""
}
else
{
$IdsProcessed.Add($thirdLineOfFile, $filePath)
}
}
if ($foundDuplicates)
{
"##teamcity[buildStatus status='FAILURE' text='One or more duplicate ID|'s were detected in Sitecore serialised items. Check the build log to see which files and ID|'s are involved.']"
exit 1
}
Write-Host "End script checking for Unicorn serialization errors. No duplicate ID's were found."
非常感谢所有人!
答案 0 :(得分:5)
尝试将$response = $request->getGraphNode();
$id = $response['id'];
替换为Get-Content
。如果这仍然太慢,请考虑使用[System.IO.File]::ReadLines
- 这会导致您编写更多代码但允许您只读取前3行。
答案 1 :(得分:4)
当您使用Get-ChildItem和Get-Content等高级命令时,并不清楚PowerShell的功能。所以我会更加明确它并直接使用.NET框架类。
使用
获取文件夹中文件的路径[String[]] $files = [System.IO.Directory]::GetFiles($folderPath, "*.yourext")
然后,打开每个文件并阅读前三行,而不是使用Get-Content。像这样:
[System.IO.StreamReader] $sr = [System.IO.File]::OpenText(path)
[String]$line = $sr.ReadLine()
while ($line -ne $null)
{
# do your thing, break when you know enough
# ...
[String]$line = $sr.ReadLine()
}
$sr.Close()
我可能犯了一两个错误,我懒得站起来在PC上测试。
您可能需要考虑重新设计构建系统以使用更少的文件。 14000个文件和增长似乎没必要。如果你可以在较少的文件中合并一些数据,它也可能有助于提高性能。
要检查重复的guid,请使用Dictionary< Guid,String>字符串作为文件名的类。然后,如果发现任何副本,您可以报告重复的位置。
答案 2 :(得分:1)
我认为您的问题可能是由您的Array引起的,并且可能不是文件读取问题。
PowerShell中数组的大小是不可变的,因此每次向数组添加项目时,它都会创建一个新数组并复制所有项目。
您的数组通常不会包含正在查找的值,并且必须将$thirdLineOfFile
与不断增长的数组中的每个项目进行比较。
我一直在使用.Net词典来解决这个问题。 (或者当我没有进行大量查找时使用ArrayLists)MSDN Dictionary Reference
注意:PowerShell提供了一个名为' Measure-Command
'的Cmdlet。您可以使用它来确定脚本的哪个部分实际上运行缓慢。我会测试文件读取时间和时间来增长数组和查找值。根据文件的大小,您实际上也可能存在性能问题。
以下是适合使用.Net字典的代码。我重命名了你的变量,因为它不再是一个数组了。
Write-Host "Start checking for Unicorn serialization errors."
$files = get-childitem "%system.teamcity.build.workingDir%\Sitecore\serialization" -recurse -include *.item | where {! $_.PSIsContainer} | % { $_.FullName }
#$arrayOfItemIds = @()
$IdsProcessed = New-Object 'system.collections.generic.dictionary[string,string]' # A .Net Dictionary will be faster for inserts and lookups.
$NrOfFiles = $files.Length
[bool] $FoundDuplicates = 0
Write-Host "There are $NrOfFiles Unicorn item files to check."
foreach ($file in $files)
{
$thirdLineOfFile = (Get-Content -path $file -TotalCount 3)[2] # TotalCount param will let us pull in just the beginning of the file.
#if ($arrayOfItemIds -contains $thirdLineOfFile)
if($IdsProcessed.ContainsKey($thirdLineOfFile))
{
$FoundDuplicates = 1
$itemId = $thirdLineOfFile.Split(":")[1].Trim()
Write-Host "Duplicate item ID found!"
Write-Host "Item file path: $file"
Write-Host "Detected duplicate ID: $itemId"
Write-Host "-------------"
Write-Host ""
}
else
{
#$arrayOfItemIds += $thirdLineOfFile
$IdsProcessed.Add($thirdLineOfFile,$null)
}
}
if ($foundDuplicates)
{
"##teamcity[buildStatus status='FAILURE' text='One or more duplicate ID's were detected in Sitecore serialised items. Check the build log to see which files and ID's are involved.']"
exit 1
}
Write-Host "End script checking for Unicorn serialization errors."