我有一个如下所示的文件;
- Visitor ID,Revenue,Channel,Flight
- 1234,100,Email,BA123
- 2345,200,PPC,BA112
- 456,150,Email,BA456
我需要生成一个包含;
的文件The count of distinct Visitor IDs (3)
The total revenue (450)
The count of each Channel
Email 2
PPC 2
The count of each Flight
BA123 1
BA112 1
BA456 1
到目前为止,我有以下代码,但是当在350MB
文件上执行此代码时,它需要太长时间并且在某些情况下会破坏memory limit
。因为我必须在function
上运行此multiple columns
,所以它会多次浏览该文件。理想情况下,我需要在一个文件传递中执行此操作。
$file = 'log.txt'
function GroupBy($columnName)
{
$objects = Import-Csv -Delimiter "`t" $file | Group-Object $columnName |
Select-Object @{n=$columnName;e={$_.Group[0].$columnName}}, Count
for($i=0;$i -lt $objects.count;$I++) {
$line += $columnName +"|"+$objects[$I]."$columnName" +"|Count|"+ $objects[$I].'Count' + $OFS
}
return $line
}
$finalOutput += GroupBy "Channel"
$finalOutput += GroupBy "Flight"
Write-Host $finalOutput
非常感谢任何帮助。
谢谢,
克雷格
答案 0 :(得分:2)
您为每列重新导入CSV的事实就是杀死您的脚本。尝试加载一次,然后重新使用数据。例如:
$data = Import-Csv .\data.csv
$flights = $data | Group-Object Flight -NoElement | ForEach-Object {[PsCustomObject]@{Flight=$_.Name;Count=$_.Count}}
$visitors = ($data | Group-Object "Visitor ID" | Measure-Object).Count
$revenue = ($data | Measure-Object Revenue -Sum).Sum
$channel = $data | Group-Object Channel -NoElement | ForEach-Object {[PsCustomObject]@{Channel=$_.Name;Count=$_.Count}}
您可以显示如下数据:
"Revenue : $revenue"
"Visitors: $visitors"
$flights | Format-Table -AutoSize
$channel | Format-Table -AutoSize
答案 1 :(得分:0)
这可能会起作用 - 使用散列图。
Group-Object
,需要更多代码。减少内存消耗:逐行读取CSV文件
$data = Import-CSV -Path "C:\temp\data.csv" -Delimiter ","
$DistinctVisitors = @{}
$TotalRevenue = 0
$ChannelCount = @{}
$FlightCount = @{}
$data | ForEach-Object {
$DistinctVisitors[$_.'Visitor ID'] = $true
$TotalRevenue += $_.Revenue
if (-not $ChannelCount.ContainsKey($_.Channel)) {
$ChannelCount[$_.Channel] = 0
}
$ChannelCount[$_.Channel] += 1
if (-not $FlightCount.ContainsKey($_.Flight)) {
$FlightCount[$_.Flight] = 0
}
$FlightCount[$_.Flight] += 1
}
$DistinctVisitorsCount = $DistinctVisitors.Keys | Measure-Object | Select-Object -ExpandProperty Count
Write-Output "The count of distinc Visitor IDs $DistinctVisitorsCount"
Write-Output "The total revenue $TotalRevenue"
Write-Output "The Count of each Channel"
$ChannelCount.Keys | ForEach-Object {
Write-Output "$_ $($ChannelCount[$_])"
}
Write-Output "The count of each Flight"
$FlightCount.Keys | ForEach-Object {
Write-Output "$_ $($FlightCount[$_])"
}