读取CSV文件,按日期对数据进行分组,并计算该日期的多个列

时间:2017-04-06 17:43:44

标签: powershell csv

我的源CSV文件:

"Name","timestamp","CPU|Demand (%)","CPU|Demand (%) (Trend)","CPU|Demand (%) (30 days forecast)"
"BDC00-Management","Mar 2, 2017 12:01:22 AM","","30.68",""
"BDC00-Management","Mar 2, 2017 12:10:00 AM","34.19","",""
"BDC00-Management","Mar 2, 2017 12:16:22 AM","","30.68",""
"BDC00-Management","Mar 2, 2017 12:20:00 AM","29.59","",""
"BDC00-Management","Mar 3, 2017 6:55:00 AM","28.76","",""
"BDC00-Management","Mar 3, 2017 7:00:00 AM","33.44","",""
"BDC00-Management","Mar 3, 2017 7:01:22 AM","","30.98",""
"BDC00-Management","Apr 1, 2017 7:01:22 PM","","","37.98"
"BDC00-Management","Apr 1, 2017 7:21:22 PM","","","37.99"
"BDC01-Horizon","Apr 2, 2017 2:56:22 AM","","","16.8"
"BDC01-Horizon","Apr 2, 2017 3:06:22 AM","","","16.78"
"BDC01-Linux","Mar 30, 2017 9:31:22 AM","","18.49",""
"BDC01-Linux","Mar 30, 2017 9:40:00 AM","18.32","",""
"BDC01-Linux","Mar 30, 2017 9:41:22 AM","","18.49",""
"BDC01-Linux","Mar 31, 2017 1:30:00 PM","18.48","",""
"BDC01-Linux","Mar 31, 2017 1:36:22 PM","","18.58",""
"BDC01-Linux","Apr 1, 2017 9:51:22 PM","","","18.67"
"BDC01-Linux","Apr 1, 2017 10:11:22 PM","","","18.68"
"BDC01-Linux","Apr 2, 2017 4:16:22 AM","","","18.69"
"BDC01-Linux","Apr 2, 2017 4:46:22 AM","","","18.7"

我需要Export-Csv一行输出,每天的数字越多,每个"姓名"。例如:

"Name","timestamp","CPU|Demand (%)","CPU|Demand (%) (Trend)","CPU|Demand (%) (30 days forecast)"
"BDC00-Management","Mar 2, 2017","34.19","30.68",""
"BDC00-Management","Mar 3, 2017","33.44","30.98",""
"BDC00-Management","Apr 1, 2017","","","37.99"
"BDC01-Horizon","Apr 2, 2017","","","16.8"
"BDC01-Linux","Mar 30, 2017","18.32","18.49",""
"BDC01-Linux","Mar 31, 2017","18.48","18.58",""
"BDC01-Linux","Apr 1, 2017","","","18.68"
"BDC01-Linux","Apr 2, 2017","","","18.7"

源文件有超过750,000行,我需要减少SharePoint中自动图形报告的大小。这很重要,我每5分钟就不需要信息。

1 个答案:

答案 0 :(得分:2)

最简单的方法是使用Group-Object按名称和日期对条目进行分组,并生成一个新对象,您可以在其中计算最大值。请注意,这需要将整个CSV读入内存。这将很慢,并为那么多行使用大量内存。

评论在代码中。尝试:

#Read csv-input
Import-Csv -Path "c:\old.csv" |
#Group entries by server and date
Group-Object Name, { ($_.timestamp -as [datetime]).Date } |
ForEach-Object {
    #Create new object per server per day with max-values
    New-Object -TypeName psobject -Property ([ordered]@{
        Name = $_.Group[0].Name
        timestamp = ($_.Group[0].timestamp -as [datetime]).ToString("MMM d, yyyy")
        "CPU|Demand (%)" = $_.Group | Measure-Object -Property "CPU|Demand (%)" -Maximum | ForEach-Object { if($_.Maximum -gt 0) { $_.Maximum } }
        "CPU|Demand (%) (Trend)" = $_.Group | Measure-Object -Property "CPU|Demand (%) (Trend)" -Maximum | ForEach-Object { if($_.Maximum -gt 0) { $_.Maximum } }
        "CPU|Demand (%) (30 days forecast)" = $_.Group | Measure-Object -Property "CPU|Demand (%) (30 days forecast)" -Maximum | ForEach-Object { if($_.Maximum -gt 0) { $_.Maximum } }
    })
} | Export-Csv -Path "c:\new.csv" -NoTypeInformation

输出:

"Name","timestamp","CPU|Demand (%)","CPU|Demand (%) (Trend)","CPU|Demand (%) (30 days forecast)"
"BDC00-Management","mar 2, 2017","34,19","30,68",""
"BDC00-Management","mar 3, 2017","33,44","30,98",""
"BDC00-Management","apr 1, 2017","","","37,99"
"BDC01-Horizon","apr 2, 2017","","","16,8"
"BDC01-Linux","mar 30, 2017","18,32","18,49",""
"BDC01-Linux","mar 31, 2017","18,48","18,58",""
"BDC01-Linux","apr 1, 2017","","","18,68"
"BDC01-Linux","apr 2, 2017","","","18,7"