从CSV获取每列的平均值

时间:2019-07-18 05:38:35

标签: powershell

我正在尝试获取有关时间戳的CSV中所有列的平均值。对象类型是system.array。每当我尝试转换整数时,都会显示错误。

timestamp   streams TRP A   B   C   D 
6/4/2019    6775    305 56  229 132 764
6/4/2019    6910    316 28  356 118 134
6/4/2019    6749    316 54  218 206 144
6/5/2019    5186    267 84  280 452 258
6/5/2019    5187    240 33  436 455 245
6/5/2019    5224    291 21  245 192 654
6/6/2019    5254    343 42  636 403 789
6/6/2019    5180    252 23  169 328 888
6/6/2019    5181    290 32  788 129 745
6/6/2019    5244    328 44  540 403 989

我从Lee_Dailey获得了以下代码的帮助,我试图根据时间戳生成每一列的平均值。我收到错误消息

Cannot convert value " " to type "System.Int32".
Error: "Index was outside the bounds of the array."
+ ... l = [Math]::Round(($GIS_Item.Group.$TPL_Item.ForEach({[int]$_}) | Mea ...
+                                                           ~~~~~~~
     + CategoryInfo          : InvalidArgument: (:) [], RuntimeException
     + FullyQualifiedErrorId : InvalidCastFromStringToInteger
$InStuff = Import-Csv 'M:\MyDoc\script\logfiles\Output_18Mar\streams_E1WAF2_OUTPUT.csv'
$TargetPropertyList = $InStuff[0].PSObject.Properties.Name.Where({$_ - ne 'TimeStamp'})

$GroupedInStuff = $InStuff | Group-Object -Property TimeStamp
$Results = foreach ($GIS_Item in $GroupedInStuff) {
    $HighestValues = [ordered]@{
        TimeStamp = $GIS_Item.Name
    }
    foreach ($TPL_Item in $TargetPropertyList) {
        $TempHiVal = [Math]::Round(($GIS_Item.Group.$TPL_Item.ForEach({[int]$_}) | Measure-Object -Average).Average)
        $HighestValues.Add($TPL_Item, $TempHiVal)
    }
    [PSCustomObject]$HighestValues
}
$Results = $Results | Sort-Object -Property {[DateTime]$_.TimeStamp}

3 个答案:

答案 0 :(得分:1)

这是一种处理在新数据集中显示的CSV文件已损坏的方法。 [ grin ]会以纯文本的形式读取文件,删除空格,并修剪最后的|

我想知道分组中只有一个日期是否是一个问题,所以我在数据集的最后一行添加了一个不同的日期。

# fake reading in a defective CSV file as plain text
#    in real life, use Get-Content
$InStuff = @'
timestamp|abc | A     |  B  |  C   |   D  |  E   |  F  |  G    |
6/4/2019 |6775 |  3059 |  4  | 2292 | 1328 | 764  |  0 |  0  |
6/4/2019 |6910 |  3167 |  28 | 3568 | 1180 | 1348 |  0 |  0  |
6/4/2019 |6749 |  3161 |  0  | 2180 | 2060 | 1440 |  0 |  28 |
6/5/2019 |6738 |  3118 |  4  | 2736 | 1396 | 984  |  0 |  0  |
6/5/2019 |6718 |  3130 |  12 | 3076 | 1008 | 452  |  0 |  4  |
6/5/2019 |6894 |  3046 |  4  | 2284 | 1556 | 624  |  0 |  0  |
1/1/2021 |1111 |  2222 |  3  | 4444 | 5555 | 666  |  7 |  8  |
'@ -split [System.Environment]::NewLine

$CleanedInStuff = $InStuff.ForEach({$_.Replace(' ', '').Trim('|')}) |
    ConvertFrom-Csv -Delimiter '|'

$TargetPropertyList = $CleanedInStuff[0].PSObject.Properties.Name.
    Where({
        $_ -ne 'TimeStamp'
        })

$GroupedCIS = $CleanedInStuff |
    Group-Object -Property TimeStamp

$Results = foreach ($GCIS_Item in $GroupedCIS) {
    $TempObject = [ordered]@{
        TimeStamp = $GCIS_Item.Name
    }
    foreach ($TPL_Item in $TargetPropertyList) {
        $TempAveValue = [Math]::Round(($GCIS_Item.Group.$TPL_Item.
            ForEach({[int]$_}) |
            Measure-Object -Average).Average, 2)
        $TempObject.Add($TPL_Item, $TempAveValue)
    }

    [PSCustomObject]$TempObject
}

$Results = $Results |
    Sort-Object -Property {
        [DateTime]$_.TimeStamp
        }

$Results

输出...

TimeStamp : 6/4/2019
abc       : 6811.33
A         : 3129
B         : 10.67
C         : 2680
D         : 1522.67
E         : 1184
F         : 0
G         : 9.33

TimeStamp : 6/5/2019
abc       : 6783.33
A         : 3098
B         : 6.67
C         : 2698.67
D         : 1320
E         : 686.67
F         : 0
G         : 1.33

TimeStamp : 1/1/2021
abc       : 1111
A         : 2222
B         : 3
C         : 4444
D         : 5555
E         : 666
F         : 7
G         : 8

答案 1 :(得分:0)

Lee肯定会建议这样做的更好方法,但这就是我完成任务的方式。这将按Timestamp属性对对象进行分组,然后可以在那里进行平均。我建议和$csv | Group-Object timestamp一起玩,看看能做什么。

$csv = import-csv C:\temp\test.csv
$Averages = New-Object System.Collections.ArrayList

Foreach($object in ($csv | Group-Object timestamp)) {

    $Averages.Add([pscustomobject]@{
        timestamp = $object.Name
        abc = ($object.group | Select-Object -ExpandProperty abc | Measure-Object -Average).Average
        a = ($object.group | Select-Object -ExpandProperty a | Measure-Object -Average).Average
        b = ($object.group | Select-Object -ExpandProperty b | Measure-Object -Average).Average
        c = ($object.group | Select-Object -ExpandProperty c | Measure-Object -Average).Average
        d = ($object.group | Select-Object -ExpandProperty d | Measure-Object -Average).Average
        e = ($object.group | Select-Object -ExpandProperty e | Measure-Object -Average).Average
        f = ($object.group | Select-Object -ExpandProperty f | Measure-Object -Average).Average
        g = ($object.group | Select-Object -ExpandProperty g | Measure-Object -Average).Average
        })
}

输出:

PS H:\> $Averages


timestamp : 6/4/2019
abc       : 6811.33333333333
a         : 3129
b         : 10.6666666666667
c         : 2680
d         : 1522.66666666667
e         : 1184
f         : 0
g         : 9.33333333333333

timestamp : 6/5/2019
abc       : 6783.33333333333
a         : 3098
b         : 6.66666666666667
c         : 2698.66666666667
d         : 1320
e         : 686.666666666667
f         : 0
g         : 1.33333333333333

答案 2 :(得分:0)

我创建的ConvertFrom-SourceTable cmdlet实际上应该读取固定宽度的列表,但是没有任何理由为什么它也不能读取定界(或变形)的表。因此,这个问题鼓励我创建一个更新,这种更新在发生这种情况时不再产生错误。

转换问题表

$Table = '
    timestamp   streams TRP A   B   C   D 
    6/4/2019    6775    305 56  229 132 764
    6/4/2019    6910    316 28  356 118 134
    6/4/2019    6749    316 54  218 206 144
    6/5/2019    5186    267 84  280 452 258
    6/5/2019    5187    240 33  436 455 245
    6/5/2019    5224    291 21  245 192 654
    6/6/2019    5254    343 42  636 403 789
    6/6/2019    5180    252 23  169 328 888
    6/6/2019    5181    290 32  788 129 745
    6/6/2019    5244    328 44  540 403 989
'

# Raw Table
ConvertFrom-SourceTable $Table | Format-Table

timestamp streams TRP A  B   C   D
--------- ------- --- -  -   -   -
6/4/2019  6775    305 56 229 132 764
6/4/2019  6910    316 28 356 118 134
6/4/2019  6749    316 54 218 206 144
6/5/2019  5186    267 84 280 452 258
6/5/2019  5187    240 33 436 455 245
6/5/2019  5224    291 21 245 192 654
6/6/2019  5254    343 42 636 403 789
6/6/2019  5180    252 23 169 328 888
6/6/2019  5181    290 32 788 129 745
6/6/2019  5244    328 44 540 403 989

#Streamed rows from pipeline:
$Table -split [System.Environment]::NewLine | ConvertFrom-SourceTable | Format-Table

timestamp streams TRP A  B   C   D
--------- ------- --- -  -   -   -
6/4/2019  6775    305 56 229 132 764
6/4/2019  6910    316 28 356 118 134
6/4/2019  6749    316 54 218 206 144
6/5/2019  5186    267 84 280 452 258
6/5/2019  5187    240 33 436 455 245
6/5/2019  5224    291 21 245 192 654
6/6/2019  5254    343 42 636 403 789
6/6/2019  5180    252 23 169 328 888
6/6/2019  5181    290 32 788 129 745
6/6/2019  5244    328 44 540 403 989

固定有垂直标尺的宽度列表

$Table = '
    | date      |  abc |    A |  B |    C |    D |    E |  F |  G |
    | 6/4/2019  | 6775 | 3059 |  4 | 2292 | 1328 |  764 |  0 |  0 |
    | 6/4/2019  | 6910 | 3167 | 28 | 3568 | 1180 | 1348 |  0 |  0 |
    | 6/4/2019  | 6749 | 3161 |  0 | 2180 | 2060 | 1440 |  0 | 28 |
    | 6/5/2019  | 6738 | 3118 |  4 | 2736 | 1396 |  984 |  0 |  0 |
    | 6/5/2019  | 6718 | 3130 | 12 | 3076 | 1008 |  452 |  0 |  4 |
    | 6/5/2019  | 6894 | 3046 |  4 | 2284 | 1556 |  624 |  0 |  0 |
    | 1/1/2021  | 1111 | 2222 |  3 | 4444 | 5555 |  666 |  7 |  8 |
'

# Raw Table
ConvertFrom-SourceTable $Table | Format-Table

date      abc    A  B    C    D    E F  G
----      ---    -  -    -    -    - -  -
6/4/2019 6775 3059  4 2292 1328  764 0  0
6/4/2019 6910 3167 28 3568 1180 1348 0  0
6/4/2019 6749 3161  0 2180 2060 1440 0 28
6/5/2019 6738 3118  4 2736 1396  984 0  0
6/5/2019 6718 3130 12 3076 1008  452 0  4
6/5/2019 6894 3046  4 2284 1556  624 0  0
1/1/2021 1111 2222  3 4444 5555  666 7  8

#Streamed rows from pipeline:
$Table -split [System.Environment]::NewLine | ConvertFrom-SourceTable | Format-Table

date      abc    A  B    C    D    E F  G
----      ---    -  -    -    -    - -  -
6/4/2019 6775 3059  4 2292 1328  764 0  0
6/4/2019 6910 3167 28 3568 1180 1348 0  0
6/4/2019 6749 3161  0 2180 2060 1440 0 28
6/5/2019 6738 3118  4 2736 1396  984 0  0
6/5/2019 6718 3130 12 3076 1008  452 0  4
6/5/2019 6894 3046  4 2284 1556  624 0  0
1/1/2021 1111 2222  3 4444 5555  666 7  8

请注意结果中的类型转换(表对齐),这意味着结果是对称的:
$Result = $Table | ConvertFrom-SourceTable | Format-Table
$Result | Format-Table <=> $Result | ConvertFrom-SourceTable | Format-Table

表格扭曲

$Table = '
    timestamp|abc | A     |  B  |  C   |   D  |  E   |  F  |  G    |
    6/4/2019 |6775 |  3059 |  4  | 2292 | 1328 | 764  |  0 |  0  |
    6/4/2019 |6910 |  3167 |  28 | 3568 | 1180 | 1348 |  0 |  0  |
    6/4/2019 |6749 |  3161 |  0  | 2180 | 2060 | 1440 |  0 |  28 |
    6/5/2019 |6738 |  3118 |  4  | 2736 | 1396 | 984  |  0 |  0  |
    6/5/2019 |6718 |  3130 |  12 | 3076 | 1008 | 452  |  0 |  4  |
    6/5/2019 |6894 |  3046 |  4  | 2284 | 1556 | 624  |  0 |  0  |
    1/1/2021 |1111 |  2222 |  3  | 4444 | 5555 | 666  |  7 |  8  |
'
# Raw Table
ConvertFrom-SourceTable $Table | Format-Table

timestamp abc  A    B  C    D    E    F G
--------- ---  -    -  -    -    -    - -
6/4/2019  6775 3059 4  2292 1328 764  0 0
6/4/2019  6910 3167 28 3568 1180 1348 0 0
6/4/2019  6749 3161 0  2180 2060 1440 0 28
6/5/2019  6738 3118 4  2736 1396 984  0 0
6/5/2019  6718 3130 12 3076 1008 452  0 4
6/5/2019  6894 3046 4  2284 1556 624  0 0
1/1/2021  1111 2222 3  4444 5555 666  7 8

#Streamed rows from pipeline:
$Table -split [System.Environment]::NewLine | ConvertFrom-SourceTable | Format-Table

timestamp abc  A    B  C    D    E    F G
--------- ---  -    -  -    -    -    - -
6/4/2019  6775 3059 4  2292 1328 764  0 0
6/4/2019  6910 3167 28 3568 1180 1348 0 0
6/4/2019  6749 3161 0  2180 2060 1440 0 28
6/5/2019  6738 3118 4  2736 1396 984  0 0
6/5/2019  6718 3130 12 3076 1008 452  0 4
6/5/2019  6894 3046 4  2284 1556 624  0 0
1/1/2021  1111 2222 3  4444 5555 666  7 8

请注意,扭曲的行将始终导致文字(字符串)转换(无类型转换)