当使用基于数组的脚本为我的“Split-ArrayInChunks”方法使用时间来处理190.000+记录时,我的初始版本基于此代码(参见Split up an array into chunks and start a job on each one.)
$computers = gc c:\somedir\complist.txt
$n = 6
$complists = @{}
$count = 0
$computers |% {$complists[$count % $n] += @($_);$count++}
0..($n-1) |% {
start-job -scriptblock {gwmi win32_operatingsystem -computername $args} -argumentlist $complists[$_]
}
我发现这篇文章Performance: The += Operator (and When to Avoid It)并且基本上建议作者使用“System.Collections.Generic.List”或“System.Collections.ArrayList”而不是数组。所以我想出了这个实现:
function Split-ArrayInChunks_UsingGenericList($inArray, $numberOfChunks) {
$list = New-Object System.Collections.Generic.List[System.Collections.Generic.List[PSCustomObject]]
$count = 0
# populate with empty lists
0..($numberOfChunks-1) | % {
$list.Add((New-Object System.Collections.Generic.List[PSCustomObject]))
}
# create packages
$inArray | % {
$list[$count % $numberOfChunks].Add($_);
$count++
}
return $list.ToArray()
}
我也试过使用“System.Collections.ArrayList”,但是这个函数返回一个平面数组。函数内部是$ arrayList一个嵌套数组,但是一旦在函数外部,我就有了一个平面数组(192169项而不是10个块)。
function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {
$arryList = New-Object System.Collections.ArrayList
$count = 0
# populate
0..($numberOfChunks-1) | % {
$arryList.Add((New-Object System.Collections.ArrayList))
}
$inArray | % {
$arryList[$count % $numberOfChunks].Add($_);
$count++
}
Write-Host 'Number of arryList:'$arryList.Count
Write-Host 'Number of items in first arryList:' $arryList[0].Count
return $arryList
}
为了说明“平坦”问题,请生成以下代码......
Write-Host '-------------------------------'
$packages1 = Split-ArrayInChunks_UsingGenericList $data.CrmRecords 10
Write-Host 'Number of packages1:'$packages1.Count
Write-Host 'Number of items in first package1:' $packages1[0].Count
Write-Host '-------------------------------'
$packages2 = Split-ArrayInChunks_UsingArrayList $data.CrmRecords 10
Write-Host 'Number of packages2:'$packages2.Count
Write-Host 'Number of items in first package2:' $packages2[0].Count
......这个输出:
-------------------------------
Number of packages1: 10
Number of items in first package1: 19215
-------------------------------
Number of arryList: 10
Number of items in first arryList: 19215
Number of packages2: 192169
Number of items in first package2: 1
所以我有两个问题:
更新2016-02-04:我根据反馈更新了我的代码(使用[void]来防止污染输出)并且它可以正常工作。唯一的问题是,当我使用| format-table时,我的版本(Split-ArrayInChunks_UsingArrayList)再次打印为平面列表:
function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {
$arryList = New-Object System.Collections.ArrayList
$count = 0
# populate
0..($numberOfChunks-1) | % {
[void]$arryList.Add((New-Object System.Collections.ArrayList))
}
$inArray | % {
[void]$arryList[$count % $numberOfChunks].Add($_);
$count++
}
return $arryList
}
function Split-ArrayInChunks_CommunityVersion($inArray, $numberOfChunks) {
$Lists = @{}
$count = 0
# populate
0..($numberOfChunks-1) | % {
$Lists[$_] = New-Object System.Collections.ArrayList
}
$inArray | % {
[void]$Lists[$count % $numberOfChunks].Add($_);
$count++
}
return $Lists
}
当我执行此代码时......
Write-Host 'CommunityVersion'
Write-Host '-------------------------------'
Split-ArrayInChunks_CommunityVersion $list 6 | Format-Table -AutoSize
Write-Host 'ArrayInChunks_UsingArrayList'
Write-Host '-------------------------------'
Split-ArrayInChunks_UsingArrayList $list 6 | Format-Table -AutoSize
...这是控制台中的输出:
CommunityVersion
-------------------------------
Name Value
---- -----
5 {denn, getan, verhaftet}
4 {haben, Böses, Morgens, war}
3 {verleumdet, etwas, eines, es}
2 {Josef K., er, er, er}
1 {musste, dass, wurde, sagte}
0 {Jemand, ohne, hätte, »Wie ein Hund!«}
ArrayInChunks_UsingArrayList
-------------------------------
Jemand
ohne
hätte
»Wie ein Hund!«
musste
dass
wurde
sagte
Josef K.
er
er
er
verleumdet
etwas
eines
es
haben
Böses
Morgens
war
denn
getan
verhaftet
我不明白为什么“ArrayInChunks_UsingArrayList”被打印为列表,它是一个嵌套数组,就像“ArrayInChunks_CommunityVersion”。
答案 0 :(得分:3)
好的,这就是我的表现:
function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {
$Lists = @{}
$count = 0
# populate
0..($numberOfChunks-1) | % {
$Lists[$_] = New-Object System.Collections.ArrayList
}
$inArray | % {
[void]$Lists[$count % $numberOfChunks].Add($_);
$count++
}
Write-Host 'Number of arryList:'$Lists.Count
Write-Host 'Number of items in first arryList:' $Lists[0].Count
return $Lists
}
答案 1 :(得分:0)
原来使用“$ inArray |%”会使操作变得如此缓慢。当使用普通的foreach循环时,需要不到2秒的时间来创建块。当使用基于“$ inArray |%”的版本时,需要 20秒:
function Split-ArrayInChunks_Fast($inArray, $numberOfChunks) {
$arrayList = New-Object System.Collections.ArrayList
$count = 0
# populate
0..($numberOfChunks-1) | % {
[void]$arrayList.Add((New-Object System.Collections.ArrayList))
}
foreach($elem in $inArray) {
[void]$arrayList[$count % $numberOfChunks].Add($elem)
$count++
}
return $arrayList.ToArray()
}
function Split-ArrayInChunks_Slow($inArray, $numberOfChunks) {
$arrayList = New-Object System.Collections.ArrayList
$count = 0
# populate
0..($numberOfChunks-1) | % {
[void]$arrayList.Add((New-Object System.Collections.ArrayList))
}
$inArray | % {
[void]$arrayList[$count % $numberOfChunks].Add($_);
$count++
}
return $arrayList.ToArray()
}