我是Powershell的新手。我尝试针对基于中等大小csv的记录(大约10000行)处理/转置行列。原始CSV由大约10000行和3列("Time","Id","IOT")
组成,如下所示:
"Time","Id","IOT"
"00:03:56","23","26"
"00:03:56","24","0"
"00:03:56","25","0"
"00:03:56","26","1"
"00:03:56","27","0"
"00:03:56","28","0"
"00:03:56","29","0"
"00:03:56","30","1953"
"00:03:56","31","22"
"00:03:56","32","39"
"00:03:56","33","8"
"00:03:56","34","5"
"00:03:56","35","269"
"00:03:56","36","5"
"00:03:56","37","0"
"00:03:56","38","0"
"00:03:56","39","0"
"00:03:56","40","1251"
"00:03:56","41","103"
"00:03:56","42","0"
"00:03:56","43","0"
"00:03:56","44","0"
"00:03:56","45","0"
"00:03:56","46","38"
"00:03:56","47","14"
"00:03:56","48","0"
"00:03:56","49","0"
"00:03:56","2013","0"
"00:03:56","2378","0"
"00:03:56","2380","32"
"00:03:56","2758","0"
"00:03:56","3127","0"
"00:03:56","3128","0"
"00:09:16","23","22"
"00:09:16","24","0"
"00:09:16","25","0"
"00:09:16","26","2"
"00:09:16","27","0"
"00:09:16","28","0"
"00:09:16","29","21"
"00:09:16","30","48"
"00:09:16","31","0"
"00:09:16","32","4"
"00:09:16","33","4"
"00:09:16","34","7"
"00:09:16","35","382"
"00:09:16","36","12"
"00:09:16","37","0"
"00:09:16","38","0"
"00:09:16","39","0"
"00:09:16","40","1882"
"00:09:16","41","42"
"00:09:16","42","0"
"00:09:16","43","3"
"00:09:16","44","0"
"00:09:16","45","0"
"00:09:16","46","24"
"00:09:16","47","22"
"00:09:16","48","0"
"00:09:16","49","0"
"00:09:16","2013","0"
"00:09:16","2378","0"
"00:09:16","2380","19"
"00:09:16","2758","0"
"00:09:16","3127","0"
"00:09:16","3128","0"
...
...
...
我尝试使用基于从https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be下载的powershell脚本的代码进行转置 基本上我的powershell代码如下:
$b = @()
foreach ($Time in $a.Time | Select -Unique) {
$Props = [ordered]@{ Time = $time }
foreach ($Id in $a.Id | Select -Unique){
$IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT
$Props += @{ $Id = $IOT }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Out-GridView
上面的代码可以提供我预期的结果,所有"Id"
值都将成为列标题,而所有"Time"
值将成为唯一行和"IOT"
值,作为{的交集{1}} x "Id"
如下:
"Time"
虽然它只涉及几百行,但结果很快就会出现,但是当处理整个csv文件时有10000行,上面的脚本“继续执行”并且似乎无法完成很长时间(小时),无法吐出任何结果。 所以,如果来自stackoverflow的一些PowerShell专家可以帮助评估上面的代码并且可能有助于修改以加快结果呢?
非常感谢你的建议
答案 0 :(得分:0)
10000条记录很多,但我不认为建议streamreader *并手动解析CSV就足够了。对你不利的最重要的事情是以下几行:
$b += New-Object -TypeName PSObject -Property $Props
PowerShell在这里做的是创建一个新数组并将该元素附加到它。这是一项非常耗费内存的操作,您需要重复1000次。在这种情况下,最好的办法是利用管道为您带来优势。
$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID | Sort-Object {[int]$_} -Unique
$data | Group-Object Time | ForEach-Object{
$props = [ordered]@{Time = $_.Name}
foreach($header in $headers){
$props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
}
[pscustomobject]$props
} | export-csv d:\temp\testing.csv -NoTypeInformation
$data
将作为对象存储在整个内存中。需要获取将成为列标题的所有$headers
。
按每个Time
对数据进行分组。然后在每个时间对象内部,我们获得每个ID的值。如果在此期间ID不存在,则该条目将显示为null。
这不是最好的方法,但应该比你的更快。我在不到一分钟的时间里跑了10000条记录(3次传球的平均值为51秒)。将基准测试显示我是否可以。
我只用自己的数据运行了一次代码,花了13分钟。我认为可以说我的表现更快。
使用此逻辑FYI
制作虚拟数据1..100 | %{
$time = get-date -Format "hh:mm:ss"
sleep -Seconds 1
1..100 | % {
[pscustomobject][ordered]@{
time = $time
id = $_
iot = Get-Random -Minimum 0 -Maximum 7
}
}
} | Export-Csv d:\temp\data.csv -notypeinformation
*
对于您的streamreader案例而言,这不是一个很好的例子。只是指出它表明它是阅读大文件的更好方法。只需要逐行解析字符串。