Powershell csv行列转置和操作

时间:2015-11-25 00:31:21

标签: powershell transpose

我是Powershell的新手。我尝试针对基于中等大小csv的记录(大约10000行)处理/转置行列。原始CSV由大约10000行和3列("Time","Id","IOT")组成,如下所示:

"Time","Id","IOT" 
"00:03:56","23","26" 
"00:03:56","24","0" 
"00:03:56","25","0" 
"00:03:56","26","1" 
"00:03:56","27","0" 
"00:03:56","28","0" 
"00:03:56","29","0" 
"00:03:56","30","1953" 
"00:03:56","31","22" 
"00:03:56","32","39" 
"00:03:56","33","8" 
"00:03:56","34","5" 
"00:03:56","35","269" 
"00:03:56","36","5" 
"00:03:56","37","0" 
"00:03:56","38","0" 
"00:03:56","39","0" 
"00:03:56","40","1251" 
"00:03:56","41","103" 
"00:03:56","42","0" 
"00:03:56","43","0" 
"00:03:56","44","0" 
"00:03:56","45","0" 
"00:03:56","46","38" 
"00:03:56","47","14" 
"00:03:56","48","0" 
"00:03:56","49","0" 
"00:03:56","2013","0" 
"00:03:56","2378","0" 
"00:03:56","2380","32" 
"00:03:56","2758","0" 
"00:03:56","3127","0" 
"00:03:56","3128","0" 
"00:09:16","23","22" 
"00:09:16","24","0" 
"00:09:16","25","0" 
"00:09:16","26","2" 
"00:09:16","27","0" 
"00:09:16","28","0" 
"00:09:16","29","21" 
"00:09:16","30","48" 
"00:09:16","31","0" 
"00:09:16","32","4" 
"00:09:16","33","4" 
"00:09:16","34","7" 
"00:09:16","35","382" 
"00:09:16","36","12" 
"00:09:16","37","0" 
"00:09:16","38","0" 
"00:09:16","39","0" 
"00:09:16","40","1882" 
"00:09:16","41","42" 
"00:09:16","42","0" 
"00:09:16","43","3" 
"00:09:16","44","0" 
"00:09:16","45","0" 
"00:09:16","46","24" 
"00:09:16","47","22" 
"00:09:16","48","0" 
"00:09:16","49","0" 
"00:09:16","2013","0" 
"00:09:16","2378","0" 
"00:09:16","2380","19" 
"00:09:16","2758","0" 
"00:09:16","3127","0" 
"00:09:16","3128","0" 
... 
... 
... 

我尝试使用基于从https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be下载的powershell脚本的代码进行转置 基本上我的powershell代码如下:

$b = @() 
    foreach ($Time in $a.Time | Select -Unique) { 
        $Props = [ordered]@{ Time = $time } 
        foreach ($Id in $a.Id | Select -Unique){ 
            $IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT 
            $Props += @{ $Id = $IOT } 
        } 
        $b += New-Object -TypeName PSObject -Property $Props 
    } 
$b | FT -AutoSize 
$b | Out-GridView 

上面的代码可以提供我预期的结果,所有"Id"值都将成为列标题,而所有"Time"值将成为唯一行和"IOT"值,作为{的交集{1}} x "Id"如下:

"Time"

虽然它只涉及几百行,但结果很快就会出现,但是当处理整个csv文件时有10000行,上面的脚本“继续执行”并且似乎无法完成很长时间(小时),无法吐出任何结果。 所以,如果来自stackoverflow的一些PowerShell专家可以帮助评估上面的代码并且可能有助于修改以加快结果呢?

非常感谢你的建议

1 个答案:

答案 0 :(得分:0)

10000条记录很多,但我不认为建议streamreader *并手动解析CSV就足够了。对你不利的最重要的事情是以下几行:

$b += New-Object -TypeName PSObject -Property $Props 

PowerShell在这里做的是创建一个新数组并将该元素附加到它。这是一项非常耗费内存的操作,您需要重复1000次。在这种情况下,最好的办法是利用管道为您带来优势。

$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID  | Sort-Object {[int]$_}  -Unique

$data | Group-Object Time | ForEach-Object{
    $props = [ordered]@{Time = $_.Name}
    foreach($header in $headers){
        $props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
    }
    [pscustomobject]$props
} |  export-csv d:\temp\testing.csv -NoTypeInformation

$data将作为对象存储在整个内存中。需要获取将成为列标题的所有$headers

按每个Time对数据进行分组。然后在每个时间对象内部,我们获得每个ID的值。如果在此期间ID不存在,则该条目将显示为null。

这不是最好的方法,但应该比你的更快。我在不到一分钟的时间里跑了10000条记录(3次传球的平均值为51秒)。将基准测试显示我是否可以。

我只用自己的数据运行了一次代码,花了13分钟。我认为可以说我的表现更快。

使用此逻辑FYI

制作虚拟数据
1..100 | %{
 $time = get-date -Format "hh:mm:ss"
 sleep -Seconds 1
    1..100 | % {

        [pscustomobject][ordered]@{
            time = $time 
            id = $_
            iot = Get-Random -Minimum 0 -Maximum 7
        } 
    }
} | Export-Csv d:\temp\data.csv -notypeinformation

*对于您的streamreader案例而言,这不是一个很好的例子。只是指出它表明它是阅读大文件的更好方法。只需要逐行解析字符串。