Powershell - 使用嵌套JSON优化CSV处理

时间:2017-12-04 15:07:50

标签: json powershell csv

我不是开发人员也不是CSV专家。我能够把这段代码放在一起,它正在完成这项工作。 为了快速概述,我需要处理嵌套在CSV中的一些JSON数据。所以我正在阅读JSON并将其拆分为额外的列,然后我将保存CSV。

现在,我的问题是虽然这很好用,但我现在需要处理一个1.5Gb的CSV文件,我不希望处理需要2天......

因此,如果你们可以帮助我调整我的脚本以便它在合理的时间内运行,我将非常感激:)

$file = Get-Content -Path 'input.csv' | Select-Object -Skip 2 | ConvertFrom-Csv
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_StreamingEndpointName -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_Id -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_AppServicePlanUri -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ImageType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ServiceType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_VMName -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_UsageType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_DatabaseAccount -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_CollectionRid -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ResourceCategory -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_displayName -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_ACCESSED-VIA-INTERNET' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_APP-NAME' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_APP-TYPE' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_APPTYPE -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_CHARGECODE -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_COMMENTS -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_COUNTRY -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_EY-REGION' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_IT-ENV' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_OWNER -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_OWNER-EMAIL' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_SERVICELINE -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_SUB-TYPE' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_TECHCONTACTS -value $null

$count=1
ForEach ($line in $file) {
Write-Output "Processing line: $count"
$count++
try{
    if ($line.AdditionalInfo -ne $null -Or $line.Tags -ne $null){
        $line.AdditionalInfo_StreamingEndpointName = ($line.AdditionalInfo | ConvertFrom-JSON).StreamingEndpointName
        $line.AdditionalInfo_Id = ($line.AdditionalInfo | ConvertFrom-JSON).Id
        $line.AdditionalInfo_AppServicePlanUri = ($line.AdditionalInfo | ConvertFrom-JSON).AppServicePlanUri
        $line.AdditionalInfo_ImageType = ($line.AdditionalInfo | ConvertFrom-JSON).ImageType
        $line.AdditionalInfo_ServiceType = ($line.AdditionalInfo | ConvertFrom-JSON).ServiceType
        $line.AdditionalInfo_VMName = ($line.AdditionalInfo | ConvertFrom-JSON).VMName
        $line.AdditionalInfo_UsageType = ($line.AdditionalInfo | ConvertFrom-JSON).UsageType
        $line.AdditionalInfo_DatabaseAccount = ($line.AdditionalInfo | ConvertFrom-JSON).DatabaseAccount
        $line.AdditionalInfo_CollectionRid = ($line.AdditionalInfo | ConvertFrom-JSON).CollectionRid
        $line.AdditionalInfo_ResourceCategory = ($line.AdditionalInfo | ConvertFrom-JSON).ResourceCategory
        $line.Tags_displayName = ($line.Tags | ConvertFrom-JSON).displayName
        $line.'Tags_ACCESSED-VIA-INTERNET' = ($line.Tags | ConvertFrom-JSON).'ACCESSED-VIA-INTERNET'
        $line.'Tags_APP-NAME' = ($line.Tags | ConvertFrom-JSON).'APP-NAME'
        $line.'Tags_APP-TYPE' = ($line.Tags | ConvertFrom-JSON).'APP-TYPE'
        $line.Tags_APPTYPE = ($line.Tags | ConvertFrom-JSON).APPTYPE
        $line.Tags_CHARGECODE = ($line.Tags | ConvertFrom-JSON).CHARGECODE
        $line.Tags_COMMENTS = ($line.Tags | ConvertFrom-JSON).COMMENTS
        $line.Tags_COUNTRY = ($line.Tags | ConvertFrom-JSON).COUNTRY
        $line.'Tags_EY-REGION' = ($line.Tags | ConvertFrom-JSON).'EY-REGION'
        $line.'Tags_IT-ENV' = ($line.Tags | ConvertFrom-JSON).'IT-ENV'
        $line.Tags_OWNER = ($line.Tags | ConvertFrom-JSON).OWNER
        $line.'Tags_OWNER-EMAIL' = ($line.Tags | ConvertFrom-JSON).'OWNER-EMAIL'
        $line.Tags_SERVICELINE = ($line.Tags | ConvertFrom-JSON).SERVICELINE
        $line.'Tags_SUB-TYPE' = ($line.Tags | ConvertFrom-JSON).'SUB-TYPE'
        $line.Tags_TECHCONTACTS = ($line.Tags | ConvertFrom-JSON).TECHCONTACTS
        }
    }
    catch {}
}

#write-output $info
$file | Export-Csv 'C:\output.csv' -NoTypeInformation

1 个答案:

答案 0 :(得分:0)

Add-Member表现糟糕。所以将所有内容一遍又一遍地保存到变量中。您可能有更好的运气,只需将所有内容保存在一个管道中并使用Select-Object计算属性:

Get-Content -Path 'input.csv' | 
    Select-Object -Skip 2 | 
    ConvertFrom-Csv | 
    Select-Object -ErrorAction SilentlyContinue -Property *,
        @{n = 'AdditionalInfo_StreamingEndpointName' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).StreamingEndpointName}},
        @{n = 'AdditionalInfo_Id'                    ; e = {($_.AdditionalInfo | ConvertFrom-JSON).Id}},
        @{n = 'AdditionalInfo_AppServicePlanUri'     ; e = {($_.AdditionalInfo | ConvertFrom-JSON).AppServicePlanUri}},
        @{n = 'AdditionalInfo_ImageType'             ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ImageType}},
        @{n = 'AdditionalInfo_ServiceType'           ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ServiceType}},
        @{n = 'AdditionalInfo_VMName'                ; e = {($_.AdditionalInfo | ConvertFrom-JSON).VMName}},
        @{n = 'AdditionalInfo_UsageType'             ; e = {($_.AdditionalInfo | ConvertFrom-JSON).UsageType}},
        @{n = 'AdditionalInfo_DatabaseAccount'       ; e = {($_.AdditionalInfo | ConvertFrom-JSON).DatabaseAccount}},
        @{n = 'AdditionalInfo_CollectionRid'         ; e = {($_.AdditionalInfo | ConvertFrom-JSON).CollectionRid}},
        @{n = 'AdditionalInfo_ResourceCategory'      ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ResourceCategory}},
        @{n = 'Tags_displayName'                     ; e = {($_.Tags | ConvertFrom-JSON).displayName}},
        @{n = 'Tags_ACCESSED-VIA-INTERNET'           ; e = {($_.Tags | ConvertFrom-JSON).'ACCESSED-VIA-INTERNET'}},
        @{n = 'Tags_APP-NAME'                        ; e = {($_.Tags | ConvertFrom-JSON).'APP-NAME'}},
        @{n = 'Tags_APP-TYPE'                        ; e = {($_.Tags | ConvertFrom-JSON).'APP-TYPE'}},
        @{n = 'Tags_APPTYPE'                         ; e = {($_.Tags | ConvertFrom-JSON).APPTYPE}},
        @{n = 'Tags_CHARGECODE'                      ; e = {($_.Tags | ConvertFrom-JSON).CHARGECODE}},
        @{n = 'Tags_COMMENTS'                        ; e = {($_.Tags | ConvertFrom-JSON).COMMENTS}},
        @{n = 'Tags_COUNTRY'                         ; e = {($_.Tags | ConvertFrom-JSON).COUNTRY}},
        @{n = 'Tags_EY-REGION'                       ; e = {($_.Tags | ConvertFrom-JSON).'EY-REGION'}},
        @{n = 'Tags_IT-ENV'                          ; e = {($_.Tags | ConvertFrom-JSON).'IT-ENV'}},
        @{n = 'Tags_OWNER'                           ; e = {($_.Tags | ConvertFrom-JSON).OWNER}},
        @{n = 'Tags_OWNER-EMAIL'                     ; e = {($_.Tags | ConvertFrom-JSON).'OWNER-EMAIL'}},
        @{n = 'Tags_SERVICELINE'                     ; e = {($_.Tags | ConvertFrom-JSON).SERVICELINE}},
        @{n = 'Tags_SUB-TYPE'                        ; e = {($_.Tags | ConvertFrom-JSON).'SUB-TYPE'}},
        @{n = 'Tags_TECHCONTACTS'                    ; e = {($_.Tags | ConvertFrom-JSON).TECHCONTACTS}} | 
    Export-Csv 'C:\output.csv' -NoTypeInformation

说起来也可能更快:

Get-Content -Path 'input.csv' | 
    Select-Object -Skip 2 | 
    ConvertFrom-Csv | 
    ForEach-Object {
        $AdditionalInfo = $_.AdditionalInfo | ConvertFrom-Json;
        $Tags = $_.Tags | ConvertFrom-Json;
        $_ | Select-Object -Property *,
            @{n = 'AdditionalInfo_StreamingEndpointName' ; e = {$AdditionalInfo.StreamingEndpointName}},
            @{n = 'AdditionalInfo_Id'                    ; e = {$AdditionalInfo.Id}},
            ...
    } | Export-Csv ...

这样你每行只会转换一次JSON。

但是,我怀疑要获得不错的表现,你必须使用.Net方法写一些东西。我建议您使用Microsoft.VisualBasic.FileIO.TextFieldParser逐行解析CSV,并可能JSON.Net使用JsonConvert.DeserializeObject()反序列化JSON。即便如此,这也不会超级快。 1.5 GB必须是几百万行。您可能最好将整个CSV导入SQL Server 2016+并使用内置JSON解析的查询。