CSV文件 - 计数不同,分组依据,总和

时间:2018-05-23 08:01:07

标签: powershell

我有一个如下所示的文件;

- Visitor ID,Revenue,Channel,Flight
- 1234,100,Email,BA123
- 2345,200,PPC,BA112
- 456,150,Email,BA456

我需要生成一个包含;

的文件
The count of distinct Visitor IDs (3)
The total revenue (450)
The count of each Channel
Email 2
PPC 2
The count of each Flight
BA123 1
BA112 1
BA456 1

到目前为止,我有以下代码,但是当在350MB文件上执行此代码时,它需要太长时间并且在某些情况下会破坏memory limit。因为我必须在function上运行此multiple columns,所以它会多次浏览该文件。理想情况下,我需要在一个文件传递中执行此操作。

$file = 'log.txt'

function GroupBy($columnName)
{
    $objects = Import-Csv -Delimiter "`t" $file | Group-Object $columnName |
       Select-Object @{n=$columnName;e={$_.Group[0].$columnName}}, Count

      for($i=0;$i -lt $objects.count;$I++) {
     $line += $columnName +"|"+$objects[$I]."$columnName" +"|Count|"+ $objects[$I].'Count' + $OFS

    }
    return $line
}

$finalOutput += GroupBy "Channel"
$finalOutput += GroupBy "Flight"


Write-Host $finalOutput

非常感谢任何帮助。

谢谢,

克雷格

2 个答案:

答案 0 :(得分:2)

您为每列重新导入CSV的事实就是杀死您的脚本。尝试加载一次,然后重新使用数据。例如:

$data = Import-Csv .\data.csv

$flights = $data | Group-Object Flight -NoElement | ForEach-Object {[PsCustomObject]@{Flight=$_.Name;Count=$_.Count}}
$visitors = ($data | Group-Object "Visitor ID" | Measure-Object).Count
$revenue = ($data | Measure-Object Revenue -Sum).Sum
$channel = $data | Group-Object Channel -NoElement | ForEach-Object {[PsCustomObject]@{Channel=$_.Name;Count=$_.Count}}

您可以显示如下数据:

"Revenue : $revenue"
"Visitors: $visitors"
$flights | Format-Table -AutoSize
$channel | Format-Table -AutoSize

答案 1 :(得分:0)

这可能会起作用 - 使用散列图。

  • 优点:它会更快/使用更少的内存。
  • 缺点:它的可读性较差 远远超过Group-Object,需要更多代码。
  • 减少内存消耗:逐行读取CSV文件

    $data = Import-CSV -Path "C:\temp\data.csv" -Delimiter ","
    $DistinctVisitors = @{}
    $TotalRevenue = 0
    $ChannelCount = @{}
    $FlightCount = @{}
    
    $data | ForEach-Object {
        $DistinctVisitors[$_.'Visitor ID'] = $true
        $TotalRevenue += $_.Revenue
    
        if (-not $ChannelCount.ContainsKey($_.Channel)) {
            $ChannelCount[$_.Channel] = 0
        }
        $ChannelCount[$_.Channel] += 1
    
        if (-not $FlightCount.ContainsKey($_.Flight)) {
            $FlightCount[$_.Flight] = 0
        }
        $FlightCount[$_.Flight] += 1
    }
    
    $DistinctVisitorsCount = $DistinctVisitors.Keys | Measure-Object | Select-Object -ExpandProperty Count
    
    Write-Output "The count of distinc Visitor IDs $DistinctVisitorsCount"
    Write-Output "The total revenue $TotalRevenue"
    Write-Output "The Count of each Channel"
    $ChannelCount.Keys | ForEach-Object {
        Write-Output "$_ $($ChannelCount[$_])"
    }
    Write-Output "The count of each Flight"
    $FlightCount.Keys | ForEach-Object {
        Write-Output "$_ $($FlightCount[$_])"
    }