PowerShell

时间:2016-02-03 12:40:27

标签: arrays performance powershell arraylist

当使用基于数组的脚本为我的“Split-ArrayInChunks”方法使用时间来处理190.000+记录时,我的初始版本基于此代码(参见Split up an array into chunks and start a job on each one.

$computers = gc c:\somedir\complist.txt
$n = 6
$complists = @{}
$count = 0 
$computers |% {$complists[$count % $n] += @($_);$count++}

0..($n-1) |% {
start-job -scriptblock {gwmi win32_operatingsystem -computername $args} -argumentlist $complists[$_]
}

我发现这篇文章Performance: The += Operator (and When to Avoid It)并且基本上建议作者使用“System.Collections.Generic.List”或“System.Collections.ArrayList”而不是数组。所以我想出了这个实现:

function Split-ArrayInChunks_UsingGenericList($inArray, $numberOfChunks) {

    $list = New-Object System.Collections.Generic.List[System.Collections.Generic.List[PSCustomObject]]
    $count = 0 

    # populate with empty lists
    0..($numberOfChunks-1) | % {
        $list.Add((New-Object System.Collections.Generic.List[PSCustomObject]))
    }

    # create packages
    $inArray | % { 
        $list[$count % $numberOfChunks].Add($_); 
        $count++ 
    }

    return $list.ToArray()
}

我也试过使用“System.Collections.ArrayList”,但是这个函数返回一个平面数组。函数内部是$ arrayList一个嵌套数组,但是一旦在函数外部,我就有了一个平面数组(192169项而不是10个块)。

function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {

    $arryList = New-Object System.Collections.ArrayList
    $count = 0 

    # populate 
    0..($numberOfChunks-1) | % {
        $arryList.Add((New-Object System.Collections.ArrayList))
    }

    $inArray | % { 
        $arryList[$count % $numberOfChunks].Add($_); 
        $count++ 
    }

    Write-Host 'Number of arryList:'$arryList.Count
    Write-Host 'Number of items in first arryList:' $arryList[0].Count
    return $arryList
}

为了说明“平坦”问题,请生成以下代码......

Write-Host '-------------------------------'
$packages1 = Split-ArrayInChunks_UsingGenericList $data.CrmRecords 10
Write-Host 'Number of packages1:'$packages1.Count
Write-Host 'Number of items in first package1:' $packages1[0].Count

Write-Host '-------------------------------'
$packages2 = Split-ArrayInChunks_UsingArrayList $data.CrmRecords 10
Write-Host 'Number of packages2:'$packages2.Count
Write-Host 'Number of items in first package2:' $packages2[0].Count

......这个输出:

-------------------------------
Number of packages1: 10
Number of items in first package1: 19215
-------------------------------
Number of arryList: 10
Number of items in first arryList: 19215
Number of packages2: 192169
Number of items in first package2: 1

所以我有两个问题:

  1. 任何改进选项都会改进我的“Split-ArrayInChunks_UsingArrayList”版本(例如更快,更易读)?
  2. 为什么“ArrayInChunks_UsingArrayList”的返回值是一个平面数组,在函数内部是%arrayList一个嵌套数组?
  3. 更新2016-02-04:我根据反馈更新了我的代码(使用[void]来防止污染输出)并且它可以正常工作。唯一的问题是,当我使用| format-table时,我的版本(Split-ArrayInChunks_UsingArrayList)再次打印为平面列表:

    function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {
        $arryList = New-Object System.Collections.ArrayList
        $count = 0 
    
        # populate 
        0..($numberOfChunks-1) | % {
    
            [void]$arryList.Add((New-Object System.Collections.ArrayList))
        }
    
        $inArray | % { 
            [void]$arryList[$count % $numberOfChunks].Add($_); 
            $count++ 
        }
    
        return $arryList
    }
    
    function Split-ArrayInChunks_CommunityVersion($inArray, $numberOfChunks) {
        $Lists = @{}
        $count = 0 
    
        # populate 
        0..($numberOfChunks-1) | % {
            $Lists[$_] = New-Object System.Collections.ArrayList
        }
    
        $inArray | % { 
            [void]$Lists[$count % $numberOfChunks].Add($_); 
            $count++ 
        }
    
        return $Lists
    }
    

    当我执行此代码时......

    Write-Host 'CommunityVersion'
    Write-Host '-------------------------------'
    Split-ArrayInChunks_CommunityVersion $list 6 | Format-Table -AutoSize
    
    Write-Host 'ArrayInChunks_UsingArrayList'
    Write-Host '-------------------------------'
    Split-ArrayInChunks_UsingArrayList $list 6 | Format-Table -AutoSize
    

    ...这是控制台中的输出:

    CommunityVersion
    -------------------------------
    
    Name Value                                 
    ---- -----                                 
    5    {denn, getan, verhaftet}              
    4    {haben, Böses, Morgens, war}          
    3    {verleumdet, etwas, eines, es}        
    2    {Josef K., er, er, er}                
    1    {musste, dass, wurde, sagte}          
    0    {Jemand, ohne, hätte, »Wie ein Hund!«}
    
    
    ArrayInChunks_UsingArrayList
    -------------------------------
    Jemand
    ohne
    hätte
    »Wie ein Hund!«
    musste
    dass
    wurde
    sagte
    Josef K.
    er
    er
    er
    verleumdet
    etwas
    eines
    es
    haben
    Böses
    Morgens
    war
    denn
    getan
    verhaftet
    

    我不明白为什么“ArrayInChunks_UsingArrayList”被打印为列表,它是一个嵌套数组,就像“ArrayInChunks_CommunityVersion”。

2 个答案:

答案 0 :(得分:3)

好的,这就是我的表现:

function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {

    $Lists = @{}
    $count = 0 

    # populate 
    0..($numberOfChunks-1) | % {
        $Lists[$_] = New-Object System.Collections.ArrayList
    }

    $inArray | % { 
        [void]$Lists[$count % $numberOfChunks].Add($_); 
        $count++ 
    }

    Write-Host 'Number of arryList:'$Lists.Count
    Write-Host 'Number of items in first arryList:' $Lists[0].Count
    return $Lists
}

答案 1 :(得分:0)

原来使用“$ inArray |%”会使操作变得如此缓慢。当使用普通的foreach循环时,需要不到2秒的时间来创建块。当使用基于“$ inArray |%”的版本时,需要 20秒

function Split-ArrayInChunks_Fast($inArray, $numberOfChunks) {

    $arrayList = New-Object System.Collections.ArrayList
    $count = 0 

    # populate 
    0..($numberOfChunks-1) | % {
        [void]$arrayList.Add((New-Object System.Collections.ArrayList))
    }

    foreach($elem in $inArray) {

       [void]$arrayList[$count % $numberOfChunks].Add($elem) 
       $count++ 
    }

    return $arrayList.ToArray()
}

function Split-ArrayInChunks_Slow($inArray, $numberOfChunks) {

    $arrayList = New-Object System.Collections.ArrayList
    $count = 0 

    # populate 
    0..($numberOfChunks-1) | % {
        [void]$arrayList.Add((New-Object System.Collections.ArrayList))
    }

    $inArray | % { 
        [void]$arrayList[$count % $numberOfChunks].Add($_); 
        $count++ 
    }

    return $arrayList.ToArray()
}